Support PHP stream wrappers

c960657 - February 26, 2008 - 23:25
Project:Drupal
Version:7.x-dev
Component:file system
Category:feature request
Priority:critical
Assigned:Unassigned
Status:needs work
Issue tags:D7FileAPIWishlist, gsoc, gsoc2009, gsoc2009-jmstacey
Description

PHP allows users to implement their own stream wrappers. This makes it possible to access virtual or remote filesystems as if they were on the local disk using fopen(), unlink(), file_get_contents() etc. The files may be stored e.g. as real files on a central server, as virtual files in a database, on remote a storage service like Amazon S3.

Handling uploaded files with Drupal in a webserver cluster is not trivial if the parallel servers do not have a shared directory on disk, e.g. using NFS. Using a stream wrapper administrators can implement their own custom logic that implements a global filesystem in their server cluster. Other uses include the ability to push additional copies of files to e.g. a CDN.

Stream wrappers has a well-defined interface that isn't specific to Drupal, so implementations of such wrappers are useful for other purposes as well.

Stream wrappers work almost like regular files, but only almost. Certain functions do not support stream wrappers, including realpath(), glob(), chmod(), tempnam() etc. So you cannot just insert "mywrapper://mysite.example.com/files" instead of "files" in Admin → Site configuration → File system.

However, with only a few modifications to the Drupal core it works. We currently run several sites in production using stream wrappers for file_directory_path and file_directory_temp. We have done a few modifications to include/file.inc:

  • we added a new function, file_realpath(), that implements realpath() for stream wrappers, and then replaced all occurrences of realpath() with file_realpath()
  • we added @ in front of chmod()
  • we replaced tempnam(realpath(file_directory_temp()), 'tmp_') with file_directory_temp() .'/tmp_'. md5(getmypid() . microtime(true))

We also had to do a few minor tweaks to some contrib modules, but generally things have just worked.

I suggest that Drupal is made stream wrapper friendly. As mentioned, this requires a few changes to file.inc and that developers refrain from using certain filesystem functions.

If people like this idea, I will post patches. Our current solution requires some adjustments in order to be applied to Drupal CVS (e.g. microtime(true) is PHP 5 only).

#1

AjK - February 27, 2008 - 23:33

Just so you know, Drupal 7 will be the first PHP5 only release.

#2

c960657 - April 16, 2008 - 22:22

This patch shows our changes to the Drupal 6 core. I'll whip up a patch against Drupal 7 as well.

Comments are welcome.

AttachmentSize
streamwrapper-D6-1.patch 6.05 KB
Testbed results
streamwrapper-D6-1.patchfailedFailed: Failed to apply patch. Detailed results

#3

c960657 - April 24, 2008 - 19:52
Status:active» needs review

Patch against HEAD.

This has been tested using the Amazon S3 package in PEAR , and file_directory_path set to "s3://mybucket/files" (running this example requires an Amazon S3 account).

require_once 'Services/Amazon/S3/Stream.php';
Services_Amazon_S3_Stream::register('s3',
  array('access_key_id'     => '12345678901234567890',
        'secret_access_key' => '1234567890123456789012345678901234567890' ));

This runs extremely slowly due (a caching layer is needed), but it serves as a proof of concept.

AttachmentSize
streamwrapper-D7-1.patch 7.76 KB
Testbed results
streamwrapper-D7-1.patchfailedFailed: Failed to apply patch. Detailed results

#4

Wim Leers - April 24, 2008 - 21:50

Another interesting patch, c960657. Thanks! And subscribing. Will review when/if I find the time.

P.S.: I wish you had an easier to remember and type nickname though ;)

#5

Susurrus - April 25, 2008 - 00:40

I would think this would be a perfect situation where a unittest would be useful. It seems like all you need to do is create a new class and then access it. Creating a class that just accesses bunk data that would be static between all instances could do this and allow testing for everyone, including users who don't have an S3 account.

I'll see if I can look into that this weekend.

#6

Dries - April 25, 2008 - 18:20

You have my interest with this patch, c960657. Cool stuff.

file_realpath() could use an example and/or some more verbose explanation.

It would be great if you could extend the PHPdoc to capture some of your experience and knowledge. It's probably useful to include some information about stream wrappers to help people understand certain decisions.

I agree with Susurrus that some tests would be key to avoid that we re-introduce changes that might break stream wrappers.

#7

moshe weitzman - April 26, 2008 - 03:27

Subscribe - nice stuff.

Where would you see caching being added for an S3 stream?

#8

c960657 - April 27, 2008 - 01:16

I expanded the comments for file_realpath().

I also added support for resolving "/../" etc. This isn't necessary in a setup like ours, because you cannot escape the S3 bucket using "/../". But in general it is required in order to make file_check_location() work reliably.

The resolver in _file_remove_dot_segments() is based on the pseudo-code in RFC 3986 (about URIs). It is probably possible to do something similar to this using fewer lines of code (e.g. using regular expressions), but it isn't as trivial as it may sound, so for now I decided go with something that is known to work in a secure and reliable way.

WRT testing it would be nice if the whole testsuite for all modules using files could be run both with and without file_directory_path set to a path using a custom stream wrapper. I don't know how to achieve that.

>Where would you see caching being added for an S3 stream?
We have implemented caching in separate stream wrappers. We have file_directory_path set to filecache://dbstatcache://s3://static.example.com/files, where static.example.com is the name of the S3 bucket. Filecache caches files on the local disk of the webserver, but passes all stat() calls on to the dbstatcache. Dbstatcache caches stat information in a database that is accessible by all webservers in the cluster (this is necessary in order to detect if an already cached file is deleted or updated), but passes all fopen() calls on to the s3-wrapper. This may sound overly complicated, and perhaps it is, but the initial idea was to create simple general-purpose components and perhaps release them as open source (so far we have only released the S3 package through PEAR, but we don't mind sharing the other).

So far we only use S3 as a SAN with local caches, but using custom_url_rewrite_outbound() we could serve files directly off S3.

Our solution works, but the speed is not impressive. The end user usually don't feel a difference because of the local cache, but uploading big files is rather sluggish. E.g. ImageCache is not optimized for high-latency filesystems. Improving that is the next step.

AttachmentSize
streamwrapper-D7-2.patch 9.48 KB
Testbed results
streamwrapper-D7-2.patchfailedFailed: Failed to apply patch. Detailed results

#9

chx - April 27, 2008 - 06:47

There are code style issues like } else { also _file_remove_dot_segments is extremely complex, are we sure we need all this?

#10

Susurrus - April 27, 2008 - 16:42

Does realpath() not do the job for removing dots?

#11

c960657 - April 27, 2008 - 17:00

>are we sure we need all this?
Perhaps not. As mentioned, we haven't implemented it in our setup, and so far without problems, so perhaps nothing in core relies on the /../ resolving feature of realpath(). The same can easily be archived using dirname().

file_realpath() could be altered to simply return false on all stream wrapper paths containing /../ rather than trying to resolve them using realpath/file_realpath(). This would still allow file_check_location() to detect potentially dangerous paths.

>Does realpath() not do the job for removing dots?
Yes, but only for regular files that exist on disk. It does not work for stream wrappers or arbitrary strings containing dots. This is the reason for introducing file_realpath().

I have attached a regexp-based approach for removing dot segments, just to show an alternative implementation. It is fewer lines of code, but as mentioned, resolving of dot segments may not be necessary at all.

AttachmentSize
streamwrapper-D7-3.patch 7.64 KB
Testbed results
streamwrapper-D7-3.patchfailedFailed: Failed to apply patch. Detailed results

#12

Dries - April 28, 2008 - 09:55

I suggest that we leave out the more complex dot-handling for now. It's probably better to focus on testing, and to introduce the complex dot-handling when we have a real need for it, and when we can actually write tests for it.

#13

c960657 - April 28, 2008 - 12:48

I updated the patch to simply return FALSE on paths containing "/../". AFAICT this is enough to prevent security exploits (assuming that the wrapper is relatively sane).

I have made file_realpath() less complicated by making it illegal to specify file_directory_path and file_directory_temp with a file:// prefix. I don't see a need to allow two equivalent syntaxes for specifying filesystem paths - it will only introduce another attack vector.

#14

Dries - April 28, 2008 - 17:43

You didn't upload your latest patch.

#15

c960657 - April 28, 2008 - 20:08

Sorry, here it is :-)

AttachmentSize
streamwrapper-D7-4.patch 8.19 KB
Testbed results
streamwrapper-D7-4.patchfailedFailed: Failed to apply patch. Detailed results

#16

Dries - April 30, 2008 - 06:58

Thanks c960657. It looks good to me.

One more question -- could you write some SimpleTests for this (without having to pull in an Amazon S3 library)? It would help ensure this feature gets maintained properly going forward.

#17

c960657 - May 6, 2008 - 21:50

I have added a dummy stream wrapper that may be used for testing.

I added two test classess: FileTestCase and FileStreamWrapperTestCase. The latter inherits from the former and is basically just a way to make sure that all tests are run with and without the dummy wrapper.

I have added tests for some of the functions in file.inc. I'd like to hear some comments on whether this is the right direction before I write further tests. I am new to writing unit tests.

Also, I have made a few more changes to file.inc:

  • file_check_directory() now uses the $mode argument for mkdir() in addition to specifying in afterwords using chmod(), because with stream wrappers chmod() doesn't work. The chmod() call is still necessary, because the $mode parameter of mkdir() is modified by the current umask.
  • file_check_location() now verifies that dirname($source) exists (i.e. $source does not have a stream wrapper prefix and a ".." segment)
AttachmentSize
streamwrapper-D7-5.patch 20.87 KB
Testbed results
streamwrapper-D7-5.patchfailedFailed: Failed to apply patch. Detailed results

#18

c960657 - May 7, 2008 - 17:50

A question about file_directory_temp:

At Administer → Site configuration → File system, the description for the "Temporary directory" text field says: A file system path where uploaded files will be stored during previews. However, AFAICT uploaded files end up in file_directory_path while being previewed. Is the description wrong?

Would it be reasonable to say that file_directory_temp shouldn't be used for saving files between requests? In that case file_directory_temp could be restricted to always point to the local disk, i.e. not use stream wrappers. The purpose is to supply a place on the local disk where native PHP functions that does not work with stream wrappers can write. E.g. imagegif() does not work with stream wrappers, so it would be convenient to write to a temporary file before copying the file to file_directory_path.

#19

moshe weitzman - May 7, 2008 - 18:02

I believe that we keep file uploads in temp dir until a node is saved (for example). thus, the file neds to persist as the user is previewing and improving his node.

#20

c960657 - May 7, 2008 - 18:26

I thought so too. But the upload module appears to save uploaded files directly in file_directory_path (using file_save_upload() in upload_node_form_submit()). The fact that the node hasn't been saved yet is represented using the FILE_STATUS_TEMPORARY flag in the "files" table. When the node is saved, the flag is changed to FILE_STATUS_PERMANENT using file_set_status().

The remaining places in core that use file_save_upload() either discard the temporary file after having read it (includes/locale.inc) or copy it to another location (modules/system/system.admin.inc and modules/user/user.module).

But perhaps I missed something when grep'ing around ... ?

#21

dopry - May 14, 2008 - 02:09

no file_directory_temp is no longer used for file uploads or storing previews. The description on the file system settings page is incorrect.

I conceptually like stream wrappers and the idea of using them. However, I don't see their utility with this implementation.

  • url generation is not addressed
  • access control is not addressed
  • error handling is not addressed.
  • hook_file can cover every use case mentioned except S3 storage.
  • FUSE or JungleDisk can do the same without having to hack realpath.
    for FUSE: http://code.google.com/p/s3fs/wiki/FuseOverAmazon

I mean it would be totally cool to be able to write fileapi as a stream wrapper, but really it's just another layer of abstraction in an already difficult and poorly understood API. It doesn't allow me to do anything I can't already do with more mature tools.

#22

dopry - May 14, 2008 - 02:20

As a systems administrator I could never recommend this as a solution to cluster file distribution when there are file systems like OCFS and NFS freely avaliable, as well as various SANS options that are orders of magnitude more mature.

#23

c960657 - May 14, 2008 - 07:31

I am not sure what you miss WRT url generation, access control and error handling?

The URLs generated by file_create_url() work fine as long as you use private files. Errors generated by custom stream wrappers are handled the same way in PHP as errors from the local filesystem.

The hook_file stuff and s3fs look interesting, though.

#24

dopry - May 15, 2008 - 05:41

@c960657:

re: file_create_url with private files only... so in the S3 use case it has to be read from the remote service and served by drupal... so the file has to move across 2 networks before getting to the client, and the process serving the request has to sit there waiting reducing the number of clients your webserver can support. It would be so much nicer if files could be served directly from the remote service... then access control integration becomes an issue.

re: error handling... right now a lot of file handling happens with the assumption that underlying errors are permanent... ie) if a file can't be stat'ed it doesn't exist... however with stream handlers to a remote service this could be a temporary error.... The api layer above the stream handlers will need to be tolerant of these temporary errors that do not happen with a real filesystem. I'm not sure if it's really an issue, but something that should be considered maybe a new class of soft error for file operations.

I really like the concept of stream wrappers, but I think the implementation has to go deeper to be worth while. If we're going to embrace streams we might as well fully embrace streams for the filesystem... not just one particular stream for the entire FS and we also need to address URL generation and access control, even more so than the error handling in my opinion.

It seems like a much cleaner and more native way of implementing the drivers for my mountpoints / fileapi concept. I already have the mount point management and path parsing layers written.. I haven't revisited it since late Drupal 4.7 or early D5 development so who knows what condition it is in...

I'll think more about and keep playing devil's advocate.... I know I already have my ideas and long term vision, alternatives are welcome... I'm usually pretty sleep deprived when I hop on here.

#25

c960657 - May 15, 2008 - 07:56

>so in the S3 use case it has to be read from the remote service and served by drupal...
>so the file has to move across 2 networks before getting to the client,
Yes, the first time, until the webserver has a copy in its local cache.

>It would be so much nicer if files could be served directly from the remote service...
One way of doing this is using custom_url_rewrite_outbound(). I don't have experience with that yet, though (the possibility of rewriting the outgoing hostname was added in Drupal 6, and our current setup was made before that).

re: error handling:
I don't know much about NFS, but I assume any networked storage may have temporary errors.

>It seems like a much cleaner and more native way of implementing the drivers for my mountpoints
>/ fileapi concept.
I tend to agree. However, using a custom-written stream wrapper, you have greater flexibility WRT to what happens when a file is written, moved etc., and you can write your own local rules in PHP rather than writing your own FUSE wrapper - a thing that most PHP programmers probably aren't familiar with.

>I'll think more about and keep playing devil's advocate....
I appreciate your comments.

We initially chose to use stream wrappers because it seemed like a non-intrusive and non-system-administration-intensive way of supporting multiple servers in a cluster. I am not sure it was the best solution, but at least it works, though not as fast as we would like during uploads (perhaps partly because we also map file_directory_temp to our S3 drive because of the current description as mentioned in #18). Also, the work connected with keeping contrib modules stream wrapper-aware turned out to be bigger than initially expected (though, if stream wrapper support is made official in Drupal core, we can push our module changes upstream).

#26

dopry - May 15, 2008 - 10:36

re: pulling from s3...
it's a cost and I understand caching locally, but it's a big waste of bandwidth, and as you noted the latency issues can be killer. thats why I halted prior S3 work the latency was proving to be a bigger challenge than I anticipated.

re: generating URL's.
if file_create_url were stream aware or had a way of hooking into URL handlers for different streams we could probably solve this in core. just a thought...

re: NFS... not nearly as frequently as S3 or as a result of latency, and it's all kind of fubar on a site when NFS goes awol in a cluster... like ugly bad :) like no document root bad...

re: fileapi/stream wrappers... so I ripped the mountpoint system from fileapi and put it in my sandbox
http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/dopry/mount... (D5, should be trivial to port to D7) with a completely untested modification to use stream wrappers instead of the original driver system I was developing for file api... It would be completely dependent on the stream wrappers in core thing, but I see a good match in my head. The contents of mountpoint.module would probably move into file.inc and mountpoint_ui would probably be headed to system.module. With tokenized paths and a smarter file_create_url it could be a fun mix.

re: you'd probably be happier with OCFS or NFS, the administrative overhead is actually minimal if you're already running a cluster, but if it gets the job done... I do still have reservations about file manipulations
and the like... (imagemagick, GD, etc)... It seems like going this route we will be copy files to local tmp storage pretty often until the rest of php's coolness becomes stream aware... I should probably look into that since It will affect a lot of the development work I do.

/me mutters about getting all excited about things he didn't want to get involved in and puts his grumpy hat back in the closet for a bit.

#27

mikl - May 16, 2008 - 15:28

Well, since my esteemed colleague, c960657, is in Riga on vacation, I'd just like to say thanks for the input, dopry. It has certainly given us something to think about.

As c960657 mentioned, getting all this to work with contrib-modules is quite painful – I spend the greater part of wednesday this week trying to upgrade image(api|cache|field) to latest stable (since we have been running -dev releases for a while), and that wasn't all that fun.

In any case, we're not dead set on S3. In fact, http://drupal.org/project/cdn looks promising, but that also involves patching the core :(

#28

c960657 - May 25, 2008 - 20:14

Though NFS (or equivalent services) may be one solution for handling files in a cluster, I think the stream wrapper approach is still worth considering. In particular, I think it is convenient to be able to write your own logic in PHP that suits your particular needs.

NFS also has its issues, so I think it would be useful to have a choice between several different approaches.

I have looked at hook_file, and I fail to see how it covers the same use cases as a stream wrapper. I can see that you are able to hook into file load and save operations, but AFAICT you cannot e.g. use file_scan_directory() to enumerate files not present on the local machine.

Updated patch:

  • Only file_directory_path has stream wrapper support, i.e. we can still use tempnam(), imagegif() etc. on files in file_directory_temp.
  • The outdated description for the file system settings page is updated.
  • New utility function: file_is_wrapper() - useful for modules using native functions without stream wrapper support
  • image_gd_close() now supports stream wrappers
AttachmentSize
streamwrapper-D7-6.patch 24.68 KB
Testbed results
streamwrapper-D7-6.patchfailedFailed: Failed to apply patch. Detailed results

#29

dopry - May 28, 2008 - 01:50

with hook_file but you could guarantee changes across the cluster on hook_file events... no you couldn't use file scan directory in the same way via hook_file, but any FUSE mounted file system could be used the same way.

#30

c960657 - July 7, 2008 - 16:59

Chasing HEAD.

AttachmentSize
streamwrapper-D7-7.patch 24.69 KB
Testbed results
streamwrapper-D7-7.patchfailedFailed: Failed to apply patch. Detailed results

#31

c960657 - July 16, 2008 - 16:28

Chasing HEAD.

AttachmentSize
streamwrapper-D7-8.patch 24.72 KB
Testbed results
streamwrapper-D7-8.patchfailedFailed: Failed to apply patch. Detailed results

#32

lilou - August 23, 2008 - 20:01
Status:needs review» needs work

Need to be re-rolled :

file.test must be moved to modules/simpletest/test/

#33

c960657 - September 18, 2008 - 22:11

Reroll. Still work in progress.

In order to test the patch, I added the following to my settings.php in order to be able to run all existing tests with file_directory_path pointing to a stream wrapper. :

require_once $_SERVER['DOCUMENT_ROOT'] . '/modules/simpletest/tests/file.dummystreamwrapper.inc';
stream_wrapper_register('dummy-stream-wrapper', 'FileDummyStreamWrapper');
$conf['file_directory_path'] = 'dummy-stream-wrapper://sites/example.org/files';

This is probably not the way to go, but at this stage it is probably a good way to ensure thorough testing. I should work on making a regular testcase that test various modules that use files.

There is a failed test here and there, but generally it doesn't look too bad.

AttachmentSize
streamwrapper-D7-9.patch 29 KB
Testbed results
streamwrapper-D7-9.patchfailedFailed: Failed to apply patch. Detailed results

#34

drewish - September 19, 2008 - 07:14

subscribe.

i'm interested in reviewing this further but there's some obvious things that need cleanup. the phpdoc doesn't follow the coding standards and the naming of things seems odd, file_realpath should probably be drupal_realpath, etc.

#35

c960657 - September 19, 2008 - 22:08

The patch is more or less complete, but lacks tests. No ready for review yet.

Instead of the addition to settings.php mentioned in comment #33, I now hacked DrupalWebTestCase::setUp() to set file_directory_path to a path with a wrapper prefix, and file_downloads to FILE_DOWNLOADS_PRIVATE. At the top of boostrap.inc I added code to always register the dummy stream wrapper with PHP. These two changes are included in the patch but are not intended to be committed.

With these two non-standard settings, all existing tests passes except two: 5722 passes, 2 fails, 0 exceptions

Without the two non-standard settings, all existing tests passes.

All tests have been run with the patches from #299290: User tests exceptions and #310113: SimpleTest: assertFieldByXPath() doesn't work in some environments, and with the change from #298309: DatabaseStatement::execute with PDO::FETCH_CLASS calls __construct() after setting object attributes reverted.

I had to change some of the tests to make the compatiable with stream wrappers and/or private files. This may not be necessary, because the tests are expected to run from a default configuration, i.e. without stream wrappers and with public files. But on the other hand it may be convenient to be able to run all tests with these non-standard settings like I have been doing.

AttachmentSize
streamwrapper-D7-10.patch 42.67 KB
Testbed results
streamwrapper-D7-10.patchfailedFailed: Failed to apply patch. Detailed results

#36

c960657 - October 8, 2008 - 18:57

Chasing HEAD.

AttachmentSize
streamwrapper-D7-11.patch 41.11 KB
Testbed results
streamwrapper-D7-11.patchfailedFailed: Failed to apply patch. Detailed results

#37

c960657 - October 9, 2008 - 20:10

Chasing HEAD (following the big hook_file commit).

AttachmentSize
streamwrapper-D7-12.patch 35.26 KB
Testbed results
streamwrapper-D7-12.patchfailedFailed: Failed to apply patch. Detailed results

#38

aaron - October 10, 2008 - 16:27

spoke w/ dopry & drewish in irc. looking at this with an eye of how emfield could benefit from the patch. i'll try to weigh in over the weekend.

#39

c960657 - October 21, 2008 - 17:21

Chasing HEAD.

AttachmentSize
file_create_url-14.patch 16.16 KB
Testbed results
file_create_url-14.patchfailedFailed: Failed to apply patch. Detailed results

#40

c960657 - October 28, 2008 - 18:00

Chasing HEAD.

(please ignore the patch in comment #39 - it belongs to another issue)

AttachmentSize
streamwrapper-D7-14.patch 42.8 KB
Testbed results
streamwrapper-D7-14.patchfailedFailed: Failed to apply patch. Detailed results

#41

c960657 - November 3, 2008 - 21:08

I have added tests for the new functions, drupal_realpath() and file_is_wrapper().

I also added a few litmus test to verify that some of the basic functions are working as expected when file_directory_path is a path with a stream wrapper prefix. Optimally the testbed robot should run the complete testsuite (all 6000 tests or so) with file_directory_path pointing to a directory with a stream wrapper prefix (the same applies for other configuration variables that affect many tests like clean_url's enabled/disabled, $db_prefix empty/non-empty etc. as discussed in comments #5 and #6 in #239542: Daily test runs). This is now possible, because setUp() preserves the existing values of file_directory_path and file_downloads like it currently does for clean_url.

If you want to run the complete suite using the dummy stream wrapper, add the following to settings.php:

require_once DRUPAL_ROOT . '/modules/simpletest/tests/file_dummy_stream_wrapper.inc';
stream_wrapper_register('dummy', 'FileDummyStreamWrapper');

Then go to http://example.org/admin/settings/file-system and change File system path from e.g. sites/example.org/files to dummy://sites/example.org/files, and set the download method to Private files. Then run the tests.

When the original file_directory_path is a regular file, all tests pass on Linux on Windows (except those that fail without the patch). When the original file_directory_path is a path with a stream wrapper prefix, all tests except the two mentioned in comment #35 pass on Linux (still waiting for the results from the Windows box).

AttachmentSize
streamwrapper-D7-15.patch 50.67 KB
Testbed results
streamwrapper-D7-15.patchfailedFailed: Failed to apply patch. Detailed results

#42

c960657 - November 26, 2008 - 21:57

Reroll.

AttachmentSize
streamwrapper-D7-18.patch 51.59 KB

#43

c960657 - November 26, 2008 - 22:02
Status:needs work» needs review

I think this works good enough to get some review.

#44

System Message - November 27, 2008 - 07:35
Status:needs review» needs work

The last submitted patch failed testing.

#45

c960657 - November 27, 2008 - 18:11

It looks like the testbot was complaining about streamwrapper-D7-15.patch, not streamwrapper-D7-18.patch. Here is a fresh reroll that passes all tests on my box.

AttachmentSize
streamwrapper-D7-19.patch 50.79 KB
Testbed results
streamwrapper-D7-19.patchfailedFailed: Failed to apply patch. Detailed results

#46

c960657 - November 27, 2008 - 18:11
Status:needs work» needs review

#47

dopry - November 30, 2008 - 09:29

-    if (($mode & FILE_CREATE_DIRECTORY) && @mkdir($directory)) {
+    if (($mode & FILE_CREATE_DIRECTORY) && @mkdir($directory, 0775)) {

The following chmod will become useless with your edit. It is also scope creep for what you are implementing in this patch. Either leave it alone or update it the logic around directory creation and error reporting.

     // Set standard file permissions for webserver-generated files
-    @chmod(realpath($image), 0664);
+    @chmod($image, 0664);

Why is real path removed above? It's also removed from several places in the Tests is this purposeful? Should it have been changed to drupal_realpath?

   function assertFilePermissions($filepath, $expected_mode, $message = NULL) {
+    // File permissions are not supported by stream wrappers
+    if (file_is_wrapper($filepath)) {
+      return;
+    }

Should we be somehow checking permissions on the stream wrapped files? It seems naieve to just skip the permissions check because it's stream wrapped. Can we require stream wrappers used with drupal to implement stat?

#48

c960657 - November 30, 2008 - 12:37

The following chmod will become useless with your edit.

True. I have now removed the chmod(). The reason for setting the permissions with mkdir() is that chmod() does not work with stream wrappers.

Why is real path removed above? It's also removed from several places in the Tests is this purposeful? Should it have been changed to drupal_realpath?

Filesystem functions automatically resolves relative file references. Without stream wrappers, adding an extra realpath() is redundant. Adding an extra drupal_realpath() is redundant, with or without stream wrappering. Some functions give slightly different error messages if the specified file does not exist, but that is the only difference AFAIK.

Should we be somehow checking permissions on the stream wrapped files? It seems naieve to just skip the permissions check because it's stream wrapped.

The thing is that chmod() does not support stream wrappers. For some reason it is only possible to read stat information on stream-wrapped files, but not to modify it. Hopefully this will be fixed in future versions of PHP. The only exception (quite strange) is that you can specify permissions on new directories with mkdir().

Calling chmod() on a stream-wrapped file and then testing the outcome using fileperms() will always fail (unless the permissions already had the requested value). If Drupal's SimpleTest supported a "skip" result in addition to "pass" and "fail" like e.g. PHPUnit, this would have been the proper status to return. Alternatively we could just let the tests fail, but I think it is better to omit them so they don't remove people's attention from other failures.

As long as all files are to be saved with the same permissions, people could make their stream wrapper implementation apply default permissions on all new files. The default permissions could be hardcoded inside the stream wrapper or set in the wrapper's default stream context e.g. in settings.php (a similar approach would probably be required, if the files are saved on e.g. S3, NTFS or other filesystems whose permission models are much different from the one supported on Linux and used my chmod/chown).

#49

c960657 - November 30, 2008 - 12:38
AttachmentSize
streamwrapper-D7-20.patch 49.26 KB
Testbed results
streamwrapper-D7-20.patchfailedFailed: Failed to apply patch. Detailed results

#50

dopry - November 30, 2008 - 20:08
Status:needs review» needs work

The directory creation logic is still incorrect.. failed mkdir's will not report an error...

#51

c960657 - November 30, 2008 - 21:55
Status:needs work» needs review

D'oh! Sorry. I hope I got it right this time.

AttachmentSize
streamwrapper-D7-21.patch 49.31 KB
Testbed results
streamwrapper-D7-21.patchfailedFailed: Failed to apply patch. a href=http://testing.drupal.org/pifr/file/1/streamwrapper-D7-21.patchDetailed results/a

#52

naxoc - December 1, 2008 - 16:05

subscribe

#53

System Message - December 3, 2008 - 12:45
Status:needs review» needs work

The last submitted patch failed testing.

#54

c960657 - December 4, 2008 - 17:40
Status:needs work» needs review

Reroll.

AttachmentSize
streamwrapper-D7-22.patch 40.88 KB
Testbed results
streamwrapper-D7-22.patchfailedFailed: 7420 passes, 0 fails, 2 exceptions Detailed results

#55

System Message - December 4, 2008 - 17:55
Status:needs review» needs work

The last submitted patch failed testing.

#56

c960657 - December 4, 2008 - 22:02
Status:needs work» needs review

Sorry, the patch was missing file_dummy_stream_wrapper.inc (I wonder why this only caused two exceptions?)

AttachmentSize
streamwrapper-D7-23.patch 48.4 KB
Testbed results
streamwrapper-D7-23.patchfailedFailed: Failed to apply patch. Detailed results

#57

System Message - December 8, 2008 - 17:15
Status:needs review» needs work

The last submitted patch failed testing.

#58

c960657 - December 8, 2008 - 21:13

Reroll.

AttachmentSize
streamwrapper-D7-24.patch 48.46 KB
Testbed results
streamwrapper-D7-24.patchfailedFailed: Failed to apply patch. Detailed results

#59

c960657 - December 8, 2008 - 21:17
Status:needs work» needs review

#60

chx - December 9, 2008 - 16:28
Priority:normal» critical
Status:needs review» needs work

I would use

<?php
function file_is_wrapper($path) {
 
// A wrapper prefix is at least two characters so that it can be distinguised
  // from a Windows drive letter, "C:/temp"?.
 
preg_match('@^(?<wrapper>[a-z0-9.+-]{2,})://@i', $path, $matches);
  return
$matches ? $matches['wrapper'] : '';
}
?>

As empty strings casted to bool are FALSE this allows you to use it as you used in the patch but also to reuse it whenever you need the name wrapper as well.

Also, I think you got the importance of this feature wrong. Fixed :)

#61

drewish - December 9, 2008 - 17:10

Finally had some time to dig into this, I think this would be a very cool feature. Here are my comments:

  1. file_directory_temp() and drupal_realpath() have bad PHPDoc block. The first paragraph should be a single sentence describing the function. Additional details should be in a subsequent paragraph.
  2. I don't think FileStreamWrapperTest::setUp() should be calling file_test_reset();
  3. FileStreamWrapperTest::testFileFunctions() is doing way to much in one function. It needs to be broken up into several smaller test functions. Actually that goes for all of FileStreamWrapperTest... I'd almost rather see you adding additional test cases to the existing test classes that specifically used stream objects.
  4. FileDummyStreamWrapper's PHPDoc block should start with /** (the second star is important) and has the same problem as file_directory_temp() in that it needs to start with a single sentence description.
  5. I'm not so sure about the file_test_init().. can't we do a drupal_include there?
  6. In image_gd_close() it's nit picky but $ok seems like a bad variable name. I'd look through core and see what else is used for holding a return code like that.

#62

chx - December 9, 2008 - 17:40

I am looking at the

<?php
-    @chmod(realpath($image), 0664);
+   @
chmod($image, 0664);
?>

and similar and wonder "is this secure"?

#63

c960657 - December 10, 2008 - 00:12

This addresses the points raised in #60 and #61.

I discussed how to write tests with drewish on IRC. Now tests for stream wrapper support in some function, foo(), is added to the class that tests that function. So in addition to testFoo(), another function testFooWithStreamWrapper() is added. See the patch for a few examples. This is still work in progress.

Since the stream wrapper tests are now no longer limited to the file tests, I added drupal_web_test_case::useStreamWrapper() instead of FileStreamWrapperTest::registerStreamWrapper() (from the previous patch). I moved file_dummy_stream_wrapper.inc into /includes in order to be able to autoload it (if it is part of the SimpleTest module it isn't available for autoload when invoked from the SimpleTest browser AFAICT).

Also, the dummy wrapper prefix is not registered when invoked from the SimpleTest browser. I fixed this by adding a temporary hack to file_directory_path(). This is only necessary until #329177: Get rid of ugly simpletest regexp. lands.

I also discussed #62 with chx. He was worried that removing realpath() would introduce security exploits. I don't think it will, but please consider this carefully when reviewing the patch.

AttachmentSize
streamwrapper-D7-25.patch 51.26 KB
Testbed results
streamwrapper-D7-25.patchfailedFailed: Failed to apply patch. Detailed results

#64

c960657 - December 12, 2008 - 00:15
Status:needs work» needs review

Added more tests. I think all file-related tests now have a stream wrapper counterpart.

I modified some of the existing tests to use the files supplied by drupal_web_test_case::drupalGetTestFiles() rather than misc/druplicon.png etc. so they can test stream wrappers just by setting file_directory_path() to a path with stream wrapper prefix prior to running the test.

The hack in file_directory_path() mentioned in #63 has been moved to _drupal_bootstrap(), but this is still just a temporary measure until #329177: Get rid of ugly simpletest regexp. has been fixed.

Don't be alarmed about the size of the patch. The actual change to core is rather limited. The majority is test-related.

AttachmentSize
streamwrapper-D7-26.patch 76.46 KB
Testbed results
streamwrapper-D7-26.patchfailedFailed: 7996 passes, 1 fail, 0 exceptions Detailed results

#65

System Message - December 13, 2008 - 21:45
Status:needs review» needs work

The last submitted patch failed testing.

#66

c960657 - December 13, 2008 - 23:55

I cannot reproduce the test failure reported by the testing bot. Any ideas?

#67

chx - December 14, 2008 - 12:29
Status:needs work» needs review

There has been fixes. I request a re-test.

#68

System Message - December 14, 2008 - 13:10
Status:needs review» needs work

The last submitted patch failed testing.

#69

c960657 - December 14, 2008 - 20:44
Status:needs work» needs review

Found the problem.

AttachmentSize
streamwrapper-D7-27.patch 76.47 KB
Testbed results
streamwrapper-D7-27.patchfailedFailed: Failed to apply patch. Detailed results

#70

System Message - December 18, 2008 - 03:55
Status:needs review» needs work

The last submitted patch failed testing.

#71

c960657 - December 18, 2008 - 19:15
Status:needs work» needs review

Reroll.

#72

c960657 - December 18, 2008 - 19:16
AttachmentSize
streamwrapper-D7-28.patch 69.02 KB
Testbed results
streamwrapper-D7-28.patchfailedFailed: 5240 passes, 2902 fails, 842 exceptions Detailed results

#73

c960657 - December 18, 2008 - 19:21
Status:needs review» needs work

#74

drewish - December 18, 2008 - 19:22
Status:needs work» needs review

c960657, at one point (on irc?) you'd said you've got an s3 stream wrapper. would you mind posting that? i'd love to actually test this with a "real" wrapper.

also i thought you were going to move the test stream wrapper init code out of _drupal_bootstrap()?

#75

drewish - December 18, 2008 - 19:23
Status:needs review» needs work

i guess i cross posted you.

#76

c960657 - December 18, 2008 - 22:18

We use the stream wrapper in the Services_Amazon_S3 PEAR package (maintained by me).

In addition we use two cache layers: one local file cache on each web server, and a shared cache in the database for stat data (so that if a file is e.g. deleted on one server, the other servers know that their cached copy is stale). Our file_directory_path is filecache://dbstatcache://s3://bucketname/files. It may look insane, but the idea is to separate the individual layers and create some general-purpose wrappers that can be submitted to e.g. PEAR, though I never got around to doing that (but if you prefer, you can of course combine S3 access and caching in one wrapper). We have combined all this in a custom module (currently not compatible with D7).

also i thought you were going to move the test stream wrapper init code out of _drupal_bootstrap()?

I am waiting for #329177: Get rid of ugly simpletest regexp. to land. For now I have moved it to the bottom conf_init(), the same place as where #329177 hooks.

AttachmentSize
streamwrapper-D7-29.patch 76.74 KB
Testbed results
streamwrapper-D7-29.patchfailedFailed: Failed to apply patch. a href=http://testing.drupal.org/pifr/file/1/streamwrapper-D7-29.patchDetailed results/a

#77

c960657 - December 18, 2008 - 23:03
Status:needs work» needs review

#78

drewish - January 3, 2009 - 02:04
Status:needs review» needs work

the patch doesn't apply. i started to re-roll it but got bogged down and ran out of time fixing file.test. the thing that still bothers me about this patch is that it adds in an additional case to every function... really wish we could do everything as a stream or none of it because it just seems like there's bound to be bugs lurking in there.

that said i went a head a refactored the tests so that the stream tests extend each test case and the setup function calls $this->useStreamWrapper(); this way every test is run for both version. it's less efficient but at least we can be sure that both paths work correctly.

i'd merged the tests and in the process of creating a patch managed to delete all the changes but had a copy of file.test kicking around so i'm attaching that.

AttachmentSize
file.test.txt 52.98 KB

#79

drewish - January 5, 2009 - 00:45

#80

c960657 - January 6, 2009 - 22:19

Reroll based on attachment to #78.

really wish we could do everything as a stream or none of it because it just seems like there's bound to be bugs lurking in there.

I.e. make file_directory_path() always return a wrapper prefix, possibly file://? I guess that is feasible. Note that file_unmanaged_* would still have to support regular filesystem paths in order to work with e.g. file_directory_temp.

AttachmentSize
streamwrapper-D7-30.patch 69.61 KB
Testbed results
streamwrapper-D7-30.patchfailedFailed: Failed to apply patch. a href=http://testing.drupal.org/pifr/file/1/streamwrapper-D7-30.patchDetailed results/a

#81

drewish - January 7, 2009 - 00:08

cool, i'll try to give this a close review tonite.

#82

c960657 - January 10, 2009 - 13:25
Status:needs work» needs review

Reroll (due to #334303: Handling overwriting of managed files (with unittests!)).

AttachmentSize
streamwrapper-D7-31.patch 67.7 KB
Testbed results
streamwrapper-D7-31.patchfailedFailed: Failed to apply patch. a href=http://testing.drupal.org/pifr/file/1/streamwrapper-D7-31.patchDetailed results/a

#83

dopry - January 11, 2009 - 09:11

I'd like to note ongoing work as a part of the media sprint is stream wrapper support. The current work can be found in contributions/modules/stream. It includes a public:// and private:// stream wrappers by default and a basic set of tests.

http://www.drupal.org/project/media
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/media/resource

The work is much more invasive than this patch so I am implementing it in contrib with a replacement for the current file api that leverages the stream wrapper support more fully. It is a bottom up OO rewrite of how non-database local and remote resources are managed.

#84

System Message - January 14, 2009 - 05:52
Status:needs review» needs work

The last submitted patch failed testing.

#85

andreiashu - February 3, 2009 - 00:39

great post !!! subscribing

#86

c960657 - February 15, 2009 - 15:59
Status:needs work» needs review

Reroll.

AttachmentSize
streamwrapper-D7-32.patch 61.43 KB
Testbed results
streamwrapper-D7-32.patchfailedFailed: 6193 passes, 4584 fails, 908 exceptions Detailed results

#87

System Message - February 15, 2009 - 16:55
Status:needs review» needs work

The last submitted patch failed testing.

#88

c960657 - February 15, 2009 - 18:05
Status:needs work» needs review

Patch missed a file.

AttachmentSize
streamwrapper-D7-33.patch 67.74 KB
Testbed results
streamwrapper-D7-33.patchfailedFailed: Failed to apply patch. Detailed results

#89

kwinters - February 18, 2009 - 18:45

Paths like "../logs/error_log" could potentially be sensitive and don't contains "/../" (but a path that ended with "/.." would always be a directory, not sure if that's OK in this context, rm -rf would be bad).

#90

System Message - February 20, 2009 - 07:35
Status:needs review» needs work

The last submitted patch failed testing.

#91

c960657 - February 24, 2009 - 00:46
Status:needs work» needs review

Reroll (due to #380064: DX: Make file_scan_directory() use save property names as file_load()).

AttachmentSize
streamwrapper-D7-34.patch 68 KB
Testbed results
streamwrapper-D7-34.patchfailedFailed: Failed to apply patch. Detailed results

#92

jhedstrom - March 4, 2009 - 23:02

Subscribing.

#93

System Message - March 9, 2009 - 12:30
Status:needs review» needs work

The last submitted patch failed testing.

#94

c960657 - March 17, 2009 - 20:59
Status:needs work» needs review

Reroll.

AttachmentSize
streamwrapper-D7-35.patch 68.34 KB
Testbed results
streamwrapper-D7-35.patchfailedFailed: Failed to apply patch. Detailed results

#95

gordon - March 26, 2009 - 02:05

subscribe

#96

System Message - March 26, 2009 - 04:30
Status:needs review» needs work

The last submitted patch failed testing.

#97

ghoti - April 6, 2009 - 17:42

<aol>me too!</aol>

#98

bjaspan - April 18, 2009 - 13:10

subscribe

#99

drewish - April 19, 2009 - 21:30

just a note that #203204: Uploaded files have the permissions set to 600 was committed which adds a drupal_chmod().

#100

c960657 - April 21, 2009 - 17:37

Reroll.

AttachmentSize
streamwrapper-D7-36.patch 67.28 KB
Testbed results
streamwrapper-D7-36.patchfailedFailed: Failed to apply patch. Detailed results

#101

c960657 - May 4, 2009 - 21:48

Reroll.

AttachmentSize
streamwrapper-D7-37.patch 65.86 KB
Testbed results
streamwrapper-D7-37.patchfailedFailed: Failed to apply patch. Detailed results

#102

eojthebrave - May 4, 2009 - 23:09

This seems like it has a lot of potential. Subscribing, and after a browsing through the patch real quick here are a couple of things I saw.

+ * Get the stream wrapper prefix from of path.
+ *
+ * @param $path
+ *   A string containing a path to a file or directory.
+ * @return
+ *   The wrapper prefix, e.g. "foo" for the path "foo://bar.txt", or FALSE is
+ *   $path is a regular filesystem path.
+ */

contains a few typos.

Should be.

+ * Get the stream wrapper prefix of a path.
+ *
+ * @param $path
+ *   A string containing a path to a file or directory.
+ * @return
+ *   The wrapper prefix, e.g. "foo" for the path "foo://bar.txt", or FALSE if
+ *   $path is a regular filesystem path.
+ */

There is a weird character in this line.

+      'name' => t('File paths and directories – with stream wrappers'),

Not sure what it is supposed to be.

Looks like the weird character is also here

+      'name' => t('Unmanaged file delete – with stream wrappers'),

and here

+      'name' => t('Unmanaged file copying –with stream wrappers'),

and here

+      'name' => t('File saving – with stream wrappers'),

and here

+      'name' => t('Realpath resolving – with stream wrappers'),

#103

c960657 - June 3, 2009 - 17:44
Status:needs work» needs review

Reroll with typos fixed and weird characters removed (the weird character was actually an en dash, but a regular hyphen is probably better after all).

AttachmentSize
streamwrapper-D7-38.patch 66.33 KB
Testbed results
streamwrapper-D7-38.patchfailedFailed: Failed to apply patch. Detailed results

#104

System Message - June 5, 2009 - 21:40
Status:needs review» needs work

The last submitted patch failed testing.

#105

webchick - June 5, 2009 - 22:12
Status:needs work» needs review

HEAD broke.

#106

jmstacey - June 8, 2009 - 14:45

Subscribing.

#107

aaron - June 8, 2009 - 15:15

I plan to jump back into this issue, hopefully this week.

First off, I'd like to note that dopry's work at http://drupal.org/project/media (with the stream wrappers in its included Resource module) has been invaluable. Anyone interested in seeing PHP stream wrappers in core should take a look at that.

I plan to take a look at the current status again. I imagine it's killing a lot of puppies, but I see no way around that to at least get the initial jump start needed. However, once we have the basic engine in place, we'll be able to do some really exciting things.

#108

aaron - June 8, 2009 - 16:49

FYI, I'm mentoring Jon Stacey (jmstacey), a student for GSOC, who's developing the Media module. We're changing tracks of that project slightly, to develop in d7 rather than d6, because there are things in core that will make the project easier to bring to fruition.

We'd like to see more, including this patch. However, the terms of a Drupal project w/ GSOC are that it can't be dependent on making commits in core.

That said, we've decided to focus the next week or two on core patches that would help things along. Specifically this patch, and probably another patch after this to make related changes to hook_file that take stream wrappers into consideration.

Jon's getting up to speed on this patch, as well as myself (I haven't looked at it in a few months).

I'm not asking anyone to vet the work or anything, and am definitely not asking that we speed up the process of adoption -- obviously we want to ensure the best work possible goes forward, and don't want to make any mistakes by committing something that's not quite ready.

However, I would be eternally grateful if interested folks could give a few minutes later in helping to review things, and help us find blocking issues.

Thanks,
Aaron Winborn

#109

drewish - June 8, 2009 - 20:41

i think there's a few issues that this ties in with that all need to be resolved at once. the thing i don't like about this patch is that it lets us use normal paths or stream wrapped paths. as i've said before i think we should pick one or the other. if say that we're only going to allow stream wrapped paths we've got an approach that could help solve several issues: #366464: Removing file_directory_path() in filepath in file table #166759: Public/Private File Handling #329301: Rename {files} to {file} and add unique key to the filepath column. i think that dopry's work on the resources module had some very promising work started. he had separate stream wrappers for public, private and temp files. i think we need to have separate directories for each and which would let us keep the full path out of the database table and still allow us to finally get a unique key on the filepath column.

#110

justinrandell - June 9, 2009 - 14:38

subscribe, coming from #166759: Public/Private File Handling.

#111

jmstacey - June 9, 2009 - 20:59

Aaron, drewish, and I had a chat this morning and the three of us have agreed to a tentative plan that I'll try to describe with our reasoning below.

The bottom line is that we need a unified resource handling system. We have an opportunity now to take care of several issues in one swoop. Ultimately, we envision core providing three basic stream wrappers: public://, private://, and temp://. This alone will bring additional functionality to core such as the ability to use both public and private filesystems at the same time. Additionally, this paves the way towards allowing contrib modules to provide extended functionality (youtube://, s3:// ...), and yet be built on a common base.

As drewish previously mentioned, this would take care of a couple issues we're facing:

#166759: Public/Private File Handling - This would be bundled up neatly once stream wrappers are implemented. The registered stream wrapper (public, private, temp,...) simply looks up the configured path based on the URI.

#366464: Removing file_directory_path() in filepath in file table - This issue can be cleaned up because the wrapper would determine the path. For example, the resource public://friends/tom/london.jpg would be handed off to the public stream wrapper which would consult the configuration settings and look in, for example, sites/default/files/friends/tom/london.jpg. A resource private://family/jennifer/wedding.jpg would be handed off to the private stream wrapper and look in say /home/bob/private_files/family/jennifer/wedding.jpg.

To accomplish this goal we need to concurrently tackle several areas:

We need to merge this patch with the work done by dopry on resource.module (currently part of Media).

#329301: Rename {files} to {file} and add unique key to the filepath column - We propose the following changes:

  • New table {file}. Identical to {files} except for the changes below.
  • 'filepath' renamed to 'uri'. Contains full URI (with wrapper). As files are migrated, their paths will be trimmed and the appropriate wrapper prepended (public://example_file.txt).
  • Index uri column. Improved query performance.
  • Remove 'status' column. The relevant wrapper (public, temp, etc.) will determine this. [pending full evaluation]

Upgrade path: We propose leaving the files table in place until all modules have migrated. We would provide helper functions to assist in migration. For example, given an old filepath return a new D7 file object that can be passed on to the appropriate storage mechanism. As a result of the proposed database schema changes we would need to provide patches for core modules that rely on the files table. At this time we are only aware of upload.module and filefield needing attention, but we will need a comprehensive evaluation.

How you can help
Getting this done will lay the groundwork for a lot of incredible new features, all built on a common foundation. However, there is a lot to accomplish and the more eyes the better. I'll be working on this full-time for the next couple of weeks. We need help from every area so drop by #drupal (irc) and ping one of us if you have the time.

#112

justinrandell - June 9, 2009 - 21:42

jmstacey: awesome summary.

i'm beejeebus in IRC, i'll be online saturday and sunday, see you then.

#113

System Message - June 10, 2009 - 02:50
Status:needs review» needs work

The last submitted patch failed testing.

#114

jmstacey - June 11, 2009 - 05:11
Issue tags:+gsoc, +gsoc2009, +gsoc2009-jmstacey

Just a small update. I've created a fork of mikl's GitHub drupal mirror for use in rapid collaborative work on the patch: http://github.com/jmstacey/drupal/tree/master

At this point the fork consists of streamwrapper-D7-38.patch, merged to drupal head, and the stream wrapper interface and registry implementation from dropyr's work on resource.module (+coding standards and doc improvements). Tomorrow I'll be fixing up tests to make sure that basic stream wrapper support actually works, and then I'll start working on file API.

Word on the street is that there will be a mini-sprint this weekend with drewish and justinrandell. If you're interested in helping out, ping me and I'll provide commit access to the development branch.

#115

jmstacey - June 13, 2009 - 00:58

Here's a checkpoint so that cursory reviews may begin. There may be regressions from streamwrapper-D7-38.patch because of a different design.

Changes so far:

  • Stream wrapper registry. Provides a way of determining which wrapper should handle a request for functions that go beyond PHP's built-in stream wrapper support (e.g. drupal_chmod). Original work from dopry.
  • Three core stream wrappers added (public, private, and temp). Original work from dopry.
  • Unit testing of wrapper registry register, unregister, and availability of core wrappers.
  • drupal_realpath() now hands responsibility to the registered wrapper if a valid scheme is detected. Fallback to PHP's realpath if URI does not contain a registered scheme. Core refactored.
  • drupal_chmod() now hands responsibility to the registered wrapper. Fallback to PHP chmod() if URI does not contain a registered scheme. Core refactored.
  • Helper functions to return a URI scheme and an appropriate wrapper instance (stream_scheme(), stream_wrapper() respectively).

Other notes:

  • The administrative interface and variables to support the three stream wrappers are not implemented.
  • drupal_tempnam() added, but uncertain implementation. Core not refactored.
  • Stream wrapper unit tests pending File Interface changes.
  • This patch is still a work in progress

Coming up:

  • Database changes and upgrade path
  • File interface overhaul
  • Core module changes
AttachmentSize
227232-stream_wrappers-inprogress-1.patch 34.33 KB
Testbed results
227232-stream_wrappers-inprogress-1.patchfailedFailed: Failed to apply patch. Detailed results

#116

justinrandell - June 13, 2009 - 13:37

updated patch is just a rearrangement of the patch at #115 so it can be applied to a clean checkout of HEAD with -p0 - no code changes.

AttachmentSize
stream_wrappers.patch 34.15 KB
Testbed results
stream_wrappers.patchfailedFailed: Failed to apply patch. Detailed results

#117

justinrandell - June 13, 2009 - 16:54

@jmstacey: i've got a git clone of your github drupal mirror, ping me in #drupal when you get a chance. i'm a git n00b, so i'll need some pointers re. pushing changes back up.

currently just trying to get basic user picture and upload module functionality to use the stream wrappers and see what works/doesn't work.

#118

drewish - June 13, 2009 - 20:55

Here's a summary/export of what we got done today. If you'd like to see it in more gory detail check out jmstacey's drupal github repo.

drewish:
- Removing all the trailing white space.
- Reorganizing the files that the streamwrapper classes are in to slim it down to two files (includes/stream_wrapper_registry.inc and includes/stream_wrappers.inc)
- Renaming all the class names to be CamelCased rather than underscore_separated.
- Tried to un-rat-fuck the whole Drupal{Public,Private,Temp}StreamWrapper inheritance tree.

justin:
- convert DrupalStreamWrapperRegistry to a static class rather than a singleton, and updated the callers
- moved stream_scheme() and stream_wrapper() into DrupalStreamWrapperRegistry
- renamed all of the underscore-named functions to CamelCase

AttachmentSize
file_streamwrappers_227232.patch 35.59 KB
Testbed results
file_streamwrappers_227232.patchfailedFailed: 11615 passes, 14 fails, 40 exceptions Detailed results

#119

tic2000 - June 27, 2009 - 08:39
Status:needs work» needs review

#120

System Message - June 27, 2009 - 11:35
Status:needs review» needs work

The last submitted patch failed testing.

#121

jmstacey - June 28, 2009 - 05:30

Quite a bit of work has been done since that patch was generated, especially with the File API. The latest checkout has all file related tests passing (last I checked). The fork needs to be merged, retested locally, and then a working patch can be submitted for review. While all the tests are working, they are not comprehensive in the context of the additional and changed functionality and could use some more work. However, initial review may commence.

I'll try to slip this in tomorrow if I have a little time between sessions at DrupalCamp Colorado.

#122

justinrandell - June 28, 2009 - 16:14

nice work. i've just pushed up a change to move the stream wrapper initialisation in to a bootstrap phase.

#123

aaron - June 29, 2009 - 05:55

Folks should take a look at #499156: CDN integration: use file_create_url() for all file URLs, as a) it's going to break our patch if committed first, which b) adds bloat to this patch, since we'd have to rewrite that patch, and c) it's on the fast-track for committing, as it's tagged Favorite-of-Dries.

#124

aaron - June 29, 2009 - 06:03

Also, FYI, we're on track for getting this RTBC tomorrow, as we'll have a bunch of folks helping to review at the Media Sprint. See you in the morning!

#125

jmstacey - June 29, 2009 - 06:50
Status:needs work» needs review

I've attached the latest diff so that the testing servers can have a go at it. The file and user profile tests are passing locally. We'll have our work cut out for us to get it RTBC but I think we can pull it off. Here's a quick list of things off the top of my head, I'm sure there are more:

  • Evaluation of mixed stream wrapper and normal path support in the API. At the moment there's an odd mixture of some functions that only support stream wrappers (those that will in some way interact with the DB), those that support both (interacting directly with local files), and those that only accept a regular path (some file_unmanaged_*). I attempted to document this compatibility in each relevant function.
  • Create documentation (introduce stream wrappers) and update the function docs. The move to stream wrappers is a bit of a fundamental shift in thinking about files. I've added a placeholder @see for a link to an introduction to stream wrappers on nearly every function that deals with the streams. I think this will go a long ways towards reducing the amount of confusion. Stream wrappers are essentially the equivalent to creating your own filesystem implementation using FUSE.
  • There's explicit checking for streams because of path separator issues. There's probably a more elegant solution such as extending DrupalStreamWrapperRegistry::normalizeUri()
  • Handling of "default" type (public/private). At the moment there is an assumption that public will always be used as the default wrapper. It would be prudent to remove the configuration options for this setting, or allow any registered wrapper to be selected as a default, but public would always be the safe fallback.
  • Test case/coverage improvement. The tests do not (and did not originally) provide full case and code coverage, particularly when it comes to storing files in subdirectories of the basepath.
  • Temp://. At the moment the actual use of this wrapper is uncertain. It may simply serve as a way of determining what temporary path to use for use with tempnam. See the note below about DB changes.
  • DB conversion. We need to get the upgrade path from D6 working and tested. Most of the work is done, I think we just need to implement the duplicate file handling as was suggested on #329301: Rename {files} to {file} and add unique key to the filepath column.

DB Changes
The aforementioned DB changes in #111 have been made with the exception of the removal of the 'status' column. The use case at hand is that you have a temporary image that you would like to make publicly available (e.g. advanced caching). temp:// in its current incarnation returns the system temp path (a replacement for file_directory_temp()). Temp can have many different meanings. I'd rather not require a new stream wrapper for a use-case like this. For the time being I've retained the status column for the continued indication of permanent/temporary files rather than move towards something messy like public://temp://myfile.jpg.

AttachmentSize
stream_wrappers_227232-1.diff 211.23 KB
Testbed results
stream_wrappers_227232-1.difffailedFailed: Failed to install HEAD. Detailed results

#126

System Message - June 29, 2009 - 07:15
Status:needs review» needs work

The last submitted patch failed testing.

#127

Wim Leers - June 29, 2009 - 08:58

After a skim through this issue, I have two questions:
- I'm wondering what performance toward S3 and so on would be like. Has anybody performed benchmarks already? Shouldn't stream wrappers to potentially high-latency remote servers not be used with care, e.g. only from within cron, to prevent extremely long page rendering times? (If PHP had threads and no limited execution time, this would not be a problem, or at least much less so).
- Doesn't the lack of chmod() support have major implications? But it does seem to be in the latest patch.

Very quick code style review:
- It's not "nineth" but "ninth".
- It's not "compatability" but "compatibility".
- Tabs instead of spaces on several occasions.
- Always use braces, never omit them (in single-line if/else tests).
- As far as I know we don't mention the type of a variable in doxygen in Drupal?

#128

aaron - June 29, 2009 - 18:28

"As far as I know we don't mention the type of a variable in doxygen in Drupal?"

http://drupal.org/node/1354 does not, but there are places in core that do.

#129

jmstacey - June 29, 2009 - 18:53

To answer your questions.
1. Stream wrappers to potentially high-latency remote servers will need to be used with care. Any implementations will need to implement caching for best performance. This is no different than writing an Amazon S3 backed filesystem using FUSE. I think most implementations won't have a problem until you get to the point of uploading an extremely large file. At this point, any performance enhancers would rely on the wrapper implementation to do stuff in the background and such.

2. drupal_chmod() supersedes chmod() and is backwards compatible with normal paths. For streams, drupal_chmod() will pass the job off to the appropriate wrapper. In the case of public, private, temp, this just interpolates the URI and calls normal chmod(). In other cases the wrapper might just ignore the chmod and do nothing (e.g. for S3). For normal paths, drupal_chmod will just fire off a regular chmod.

- The spelling corrections are fixed and I replaced any tabs the slipped in.
- I didn't notice any if/else short statements in my skim but I must just be overlooking them because I know I've seen them. Line numbers would help.
- As far as documentation, the Doxygen page doesn't mention using it, however I find it very useful because the type is not always mentioned in the parameter description. You immediately know what's expected or to be returned without looking at the code. Some current core functions such as the existing drupal_chmod add the variable type. We should make a decisions and update the Doxygen page to specify the standard we'll use. The coding standards page also needs to be updated to reference the alternate camelCase use with SimpleTests.

#130

jmstacey - June 29, 2009 - 19:07
Status:needs work» needs review

Just the minor modifications I mentioned before.

AttachmentSize
stream_wrappers_227232-2.diff 210.91 KB
Testbed results
stream_wrappers_227232-2.difffailedFailed: Failed to install HEAD. Detailed results

#131

aaron - June 29, 2009 - 19:14
Status:needs review» needs work

potential workflow for s3 using stream wrappers:

* user uploads a file through the s3 wrapper
* s3 wrapper saves the file locally and marks it to be moved on the next cron
* later, someone wants the URL for that file
* if the file has not yet been moved, the local url is returned
* else if the admin checkbox for using s3 is turned off, the local url is returned
* else if the server has otherwise determined that s3 is down, the local url is returned
* else the remote s3 url is returned

performance should be optimal in this setup

#132

aaron - June 29, 2009 - 19:15
Status:needs work» needs review

sorry -- will this rerun the test?

#133

System Message - June 29, 2009 - 19:25
Status:needs review» needs work

The last submitted patch failed testing.

#134

aaron - June 29, 2009 - 19:40

Fatal error: Class 'DrupalStreamWrapperRegistry' not found in /var/www/d7-stream/includes/file.inc on line 1615 when you run install.php. i'll look at that right now.

#135

tic2000 - June 29, 2009 - 20:07

+  // Try created the .htaccess file if it's missing from the public, private, or temp directories.

#136

justinrandell - June 29, 2009 - 21:07

at #134 - that's because we use the files in the installer, but don't initialise the Drupal's stream wrappers. working on patch...

#137

aaron - June 29, 2009 - 22:10

I just committed a refactoring of file_get_mimetype in git, so that it now works through the schema/uri (allowing us to set custom mimetypes for youtube videos, etc).

#138

aaron - June 30, 2009 - 00:31

added constants for SCHEME_PUBLIC, SCHEME_PRIVATE, SCHEME_TEMP

#139

aaron - June 30, 2009 - 02:17

this latest patch is up to #138, and also includes a new test for SimpleTest, as we now add a simpletest:// stream wrapper for testing.

AttachmentSize
stream_wrappers_227323_3.diff 216.52 KB
Testbed results
stream_wrappers_227323_3.difffailedFailed: Failed to apply patch. Detailed results

#140

Rob Loach - June 30, 2009 - 20:42

Wow, this is awesome... Subscribing.

node:// WHAT?!

#141

jmstacey - July 1, 2009 - 06:23

Current known issues:

  • Drupal installation bombs. This is caused because stream wrappers are not initialized in cases where file.inc are included outside of a bootstrap.
  • Upload tests (and possibly others) are failing. These are test issues that will be resolved with the completion of the simpletest:// stream wrapper and its integration with all file related tests.

#142

jmstacey - July 10, 2009 - 00:23
Status:needs work» needs review

Here's the latest patch. This resolves the previously known issues, and some other unexpected breakages.

Here's the most recent list of things that need work:

  • DB upgrade path from Drupal 6. Partially implemented, see todo in upload_update_7000().
  • D.O. Documentation introducing stream wrappers (see notes in #125). Search and Replace @see lines with real URL.
  • Abstract static method cleanup. Initially, there was some confusion about abstract static methods... they don't make sense [Source]. This resulted in some messy calls like demonstrated on http://drupalbin.com/10347. While static functions would be ideal, I don't know if this can be accomplished elegantly. I'll be happy with the current implementation if we only pass data once (when instantiating).
  • Completion of "default" type handling. Aaron started by implementing constants for the core stream wrappers. Hardcoding of wrappers needs to be refactored to use these constants which are declared at the top of stream_wrappers.php.
  • There are way too many assumptions about using the public wrapper. This ties in with the default type handling. I think we should provide a system default, but ultimately individual modules can determine their own behavior. Example: BlogAPI probably shouldn't assume that we're using a public filesystem.
  • Usage of normalizeUri(). Anything that has the possibility to be located in a subdirectory should use normalizeUri and assume a prepended slash. This takes care of cases such as profile images that can be configured by the user. Images might be stored in the root directory so normalizeUri will turn "public:///" into "public://" and the assumption of the forward slash will allow subdirectories such as "public://pictures" to work.
  • Test coverage, robustness, and cleanup--we can always use more. Test subdirectories, make use of simpletest://, test core stream wrapper implementations, etc.
  • Ongoing: Coding standards review and cleanup

Upcoming code sprint in Philadelphia, July 26-27.

AttachmentSize
stream_wrappers_227323_4.diff 225.99 KB
Testbed results
stream_wrappers_227323_4.difffailedFailed: 11575 passes, 14 fails, 6 exceptions Detailed results

#143

System Message - July 8, 2009 - 17:00
Status:needs review» needs work

The last submitted patch failed testing.

#144

jmstacey - July 10, 2009 - 03:06
Status:needs work» needs review

This reroll fixes the file download test which was failing because the private file path doesn't exist on the testing slaves. This also includes cleanup of the abstract static issue and a couple other exceptions. Simpletest is reporting a failure locally, but not what test specifically, so I'm hoping the testing server will shed more light on that problem.

AttachmentSize
stream_wrappers_227323_5.diff 226.48 KB
Testbed results
stream_wrappers_227323_5.difffailedFailed: 11587 passes, 2 fails, 0 exceptions Detailed results

#145

System Message - July 10, 2009 - 03:20
Status:needs review» needs work

The last submitted patch failed testing.

#146

jmstacey - July 10, 2009 - 06:07
Status:needs work» needs review

Thanks to cwgordon7 this test should finally pass. We're going to look into breaking this down into smaller and more specialized patch sequences, so this may be one of the last giant patches.

Edit: Actually this won't pass because #515280: file_check_directory() should create recursively was just committed.

AttachmentSize
stream_wrappers_227323_6.diff 226.4 KB
Testbed results
stream_wrappers_227323_6.difffailedFailed: Failed to apply patch. Detailed results

#147

System Message - July 10, 2009 - 04:30
Status:needs review» needs work

The last submitted patch failed testing.

 
 

Drupal is a registered trademark of Dries Buytaert.