PHP allows users to implement their own stream wrappers. This makes it possible to access virtual or remote filesystems as if they were on the local disk using fopen(), unlink(), file_get_contents() etc. The files may be stored e.g. as real files on a central server, as virtual files in a database, on remote a storage service like Amazon S3.

Handling uploaded files with Drupal in a webserver cluster is not trivial if the parallel servers do not have a shared directory on disk, e.g. using NFS. Using a stream wrapper administrators can implement their own custom logic that implements a global filesystem in their server cluster. Other uses include the ability to push additional copies of files to e.g. a CDN.

Stream wrappers has a well-defined interface that isn't specific to Drupal, so implementations of such wrappers are useful for other purposes as well.

Stream wrappers work almost like regular files, but only almost. Certain functions do not support stream wrappers, including realpath(), glob(), chmod(), tempnam() etc. So you cannot just insert "mywrapper://mysite.example.com/files" instead of "files" in Admin → Site configuration → File system.

However, with only a few modifications to the Drupal core it works. We currently run several sites in production using stream wrappers for file_directory_path and file_directory_temp. We have done a few modifications to include/file.inc:

  • we added a new function, file_realpath(), that implements realpath() for stream wrappers, and then replaced all occurrences of realpath() with file_realpath()
  • we added @ in front of chmod()
  • we replaced tempnam(realpath(file_directory_temp()), 'tmp_') with file_directory_temp() .'/tmp_'. md5(getmypid() . microtime(true))

We also had to do a few minor tweaks to some contrib modules, but generally things have just worked.

I suggest that Drupal is made stream wrapper friendly. As mentioned, this requires a few changes to file.inc and that developers refrain from using certain filesystem functions.

If people like this idea, I will post patches. Our current solution requires some adjustments in order to be applied to Drupal CVS (e.g. microtime(true) is PHP 5 only).

CommentFileSizeAuthor
#183 stream_wrappers_227232-183.patch33.93 KBpwolanin
#182 stream_wrappers_227232-181.patch33.93 KBpwolanin
#181 stream_wrappers_227232-180.patch33.93 KBpwolanin
#178 stream_wrappers_227232-178.patch34.29 KBpwolanin
#177 stream_wrappers_227232-177.patch34.26 KBpwolanin
#176 stream_wrappers_227232-175.patch34.35 KBpwolanin
#174 stream_wrappers_227232-174.patch34.37 KBpwolanin
#168 stream_wrappers-168.patch33.73 KBjmstacey
#166 stream_wrappers_227232.patch33.39 KBdrewish
#163 stream_wrappers_227232-163.patch31 KBpwolanin
#161 stream_wrappers_227232-160.patch31.06 KBpwolanin
#142 stream_wrappers_227323_4.diff225.99 KBjmstacey
#158 stream_wrappers_227232-9.patch31.82 KBjmstacey
#154 stream_wrappers_227232-8a.patch30.97 KBjmstacey
#148 stream_wrappers_227232-7.patch30.79 KBjmstacey
#146 stream_wrappers_227323_6.diff226.4 KBjmstacey
#144 stream_wrappers_227323_5.diff226.48 KBjmstacey
#139 stream_wrappers_227323_3.diff216.52 KBaaron
#130 stream_wrappers_227232-2.diff210.91 KBjmstacey
#125 stream_wrappers_227232-1.diff211.23 KBjmstacey
#118 file_streamwrappers_227232.patch35.59 KBdrewish
#116 stream_wrappers.patch34.15 KBAnonymous (not verified)
#115 227232-stream_wrappers-inprogress-1.patch34.33 KBjmstacey
#103 streamwrapper-D7-38.patch66.33 KBc960657
#101 streamwrapper-D7-37.patch65.86 KBc960657
#100 streamwrapper-D7-36.patch67.28 KBc960657
#94 streamwrapper-D7-35.patch68.34 KBc960657
#91 streamwrapper-D7-34.patch68 KBc960657
#88 streamwrapper-D7-33.patch67.74 KBc960657
#86 streamwrapper-D7-32.patch61.43 KBc960657
#82 streamwrapper-D7-31.patch67.7 KBc960657
#80 streamwrapper-D7-30.patch69.61 KBc960657
#78 file.test.txt52.98 KBdrewish
#76 streamwrapper-D7-29.patch76.74 KBc960657
#72 streamwrapper-D7-28.patch69.02 KBc960657
#69 streamwrapper-D7-27.patch76.47 KBc960657
#64 streamwrapper-D7-26.patch76.46 KBc960657
#63 streamwrapper-D7-25.patch51.26 KBc960657
#58 streamwrapper-D7-24.patch48.46 KBc960657
#56 streamwrapper-D7-23.patch48.4 KBc960657
#54 streamwrapper-D7-22.patch40.88 KBc960657
#51 streamwrapper-D7-21.patch49.31 KBc960657
#49 streamwrapper-D7-20.patch49.26 KBc960657
#45 streamwrapper-D7-19.patch50.79 KBc960657
#42 streamwrapper-D7-18.patch51.59 KBc960657
#41 streamwrapper-D7-15.patch50.67 KBc960657
#40 streamwrapper-D7-14.patch42.8 KBc960657
#39 file_create_url-14.patch16.16 KBc960657
#37 streamwrapper-D7-12.patch35.26 KBc960657
#36 streamwrapper-D7-11.patch41.11 KBc960657
#35 streamwrapper-D7-10.patch42.67 KBc960657
#33 streamwrapper-D7-9.patch29 KBc960657
#31 streamwrapper-D7-8.patch24.72 KBc960657
#30 streamwrapper-D7-7.patch24.69 KBc960657
#28 streamwrapper-D7-6.patch24.68 KBc960657
#17 streamwrapper-D7-5.patch20.87 KBc960657
#15 streamwrapper-D7-4.patch8.19 KBc960657
#11 streamwrapper-D7-3.patch7.64 KBc960657
#8 streamwrapper-D7-2.patch9.48 KBc960657
#3 streamwrapper-D7-1.patch7.76 KBc960657
#2 streamwrapper-D6-1.patch6.05 KBc960657
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

AjK’s picture

Just so you know, Drupal 7 will be the first PHP5 only release.

c960657’s picture

FileSize
6.05 KB

This patch shows our changes to the Drupal 6 core. I'll whip up a patch against Drupal 7 as well.

Comments are welcome.

c960657’s picture

Status: Active » Needs review
FileSize
7.76 KB

Patch against HEAD.

This has been tested using the Amazon S3 package in PEAR , and file_directory_path set to "s3://mybucket/files" (running this example requires an Amazon S3 account).

require_once 'Services/Amazon/S3/Stream.php';
Services_Amazon_S3_Stream::register('s3',
  array('access_key_id'     => '12345678901234567890',
        'secret_access_key' => '1234567890123456789012345678901234567890' ));

This runs extremely slowly due (a caching layer is needed), but it serves as a proof of concept.

Wim Leers’s picture

Another interesting patch, c960657. Thanks! And subscribing. Will review when/if I find the time.

P.S.: I wish you had an easier to remember and type nickname though ;)

Susurrus’s picture

I would think this would be a perfect situation where a unittest would be useful. It seems like all you need to do is create a new class and then access it. Creating a class that just accesses bunk data that would be static between all instances could do this and allow testing for everyone, including users who don't have an S3 account.

I'll see if I can look into that this weekend.

Dries’s picture

You have my interest with this patch, c960657. Cool stuff.

file_realpath() could use an example and/or some more verbose explanation.

It would be great if you could extend the PHPdoc to capture some of your experience and knowledge. It's probably useful to include some information about stream wrappers to help people understand certain decisions.

I agree with Susurrus that some tests would be key to avoid that we re-introduce changes that might break stream wrappers.

moshe weitzman’s picture

Subscribe - nice stuff.

Where would you see caching being added for an S3 stream?

c960657’s picture

FileSize
9.48 KB

I expanded the comments for file_realpath().

I also added support for resolving "/../" etc. This isn't necessary in a setup like ours, because you cannot escape the S3 bucket using "/../". But in general it is required in order to make file_check_location() work reliably.

The resolver in _file_remove_dot_segments() is based on the pseudo-code in RFC 3986 (about URIs). It is probably possible to do something similar to this using fewer lines of code (e.g. using regular expressions), but it isn't as trivial as it may sound, so for now I decided go with something that is known to work in a secure and reliable way.

WRT testing it would be nice if the whole testsuite for all modules using files could be run both with and without file_directory_path set to a path using a custom stream wrapper. I don't know how to achieve that.

>Where would you see caching being added for an S3 stream?
We have implemented caching in separate stream wrappers. We have file_directory_path set to filecache://dbstatcache://s3://static.example.com/files, where static.example.com is the name of the S3 bucket. Filecache caches files on the local disk of the webserver, but passes all stat() calls on to the dbstatcache. Dbstatcache caches stat information in a database that is accessible by all webservers in the cluster (this is necessary in order to detect if an already cached file is deleted or updated), but passes all fopen() calls on to the s3-wrapper. This may sound overly complicated, and perhaps it is, but the initial idea was to create simple general-purpose components and perhaps release them as open source (so far we have only released the S3 package through PEAR, but we don't mind sharing the other).

So far we only use S3 as a SAN with local caches, but using custom_url_rewrite_outbound() we could serve files directly off S3.

Our solution works, but the speed is not impressive. The end user usually don't feel a difference because of the local cache, but uploading big files is rather sluggish. E.g. ImageCache is not optimized for high-latency filesystems. Improving that is the next step.

chx’s picture

There are code style issues like } else { also _file_remove_dot_segments is extremely complex, are we sure we need all this?

Susurrus’s picture

Does realpath() not do the job for removing dots?

c960657’s picture

FileSize
7.64 KB

>are we sure we need all this?
Perhaps not. As mentioned, we haven't implemented it in our setup, and so far without problems, so perhaps nothing in core relies on the /../ resolving feature of realpath(). The same can easily be archived using dirname().

file_realpath() could be altered to simply return false on all stream wrapper paths containing /../ rather than trying to resolve them using realpath/file_realpath(). This would still allow file_check_location() to detect potentially dangerous paths.

>Does realpath() not do the job for removing dots?
Yes, but only for regular files that exist on disk. It does not work for stream wrappers or arbitrary strings containing dots. This is the reason for introducing file_realpath().

I have attached a regexp-based approach for removing dot segments, just to show an alternative implementation. It is fewer lines of code, but as mentioned, resolving of dot segments may not be necessary at all.

Dries’s picture

I suggest that we leave out the more complex dot-handling for now. It's probably better to focus on testing, and to introduce the complex dot-handling when we have a real need for it, and when we can actually write tests for it.

c960657’s picture

I updated the patch to simply return FALSE on paths containing "/../". AFAICT this is enough to prevent security exploits (assuming that the wrapper is relatively sane).

I have made file_realpath() less complicated by making it illegal to specify file_directory_path and file_directory_temp with a file:// prefix. I don't see a need to allow two equivalent syntaxes for specifying filesystem paths - it will only introduce another attack vector.

Dries’s picture

You didn't upload your latest patch.

c960657’s picture

FileSize
8.19 KB

Sorry, here it is :-)

Dries’s picture

Thanks c960657. It looks good to me.

One more question -- could you write some SimpleTests for this (without having to pull in an Amazon S3 library)? It would help ensure this feature gets maintained properly going forward.

c960657’s picture

FileSize
20.87 KB

I have added a dummy stream wrapper that may be used for testing.

I added two test classess: FileTestCase and FileStreamWrapperTestCase. The latter inherits from the former and is basically just a way to make sure that all tests are run with and without the dummy wrapper.

I have added tests for some of the functions in file.inc. I'd like to hear some comments on whether this is the right direction before I write further tests. I am new to writing unit tests.

Also, I have made a few more changes to file.inc:

  • file_check_directory() now uses the $mode argument for mkdir() in addition to specifying in afterwords using chmod(), because with stream wrappers chmod() doesn't work. The chmod() call is still necessary, because the $mode parameter of mkdir() is modified by the current umask.
  • file_check_location() now verifies that dirname($source) exists (i.e. $source does not have a stream wrapper prefix and a ".." segment)
c960657’s picture

A question about file_directory_temp:

At Administer → Site configuration → File system, the description for the "Temporary directory" text field says: A file system path where uploaded files will be stored during previews. However, AFAICT uploaded files end up in file_directory_path while being previewed. Is the description wrong?

Would it be reasonable to say that file_directory_temp shouldn't be used for saving files between requests? In that case file_directory_temp could be restricted to always point to the local disk, i.e. not use stream wrappers. The purpose is to supply a place on the local disk where native PHP functions that does not work with stream wrappers can write. E.g. imagegif() does not work with stream wrappers, so it would be convenient to write to a temporary file before copying the file to file_directory_path.

moshe weitzman’s picture

I believe that we keep file uploads in temp dir until a node is saved (for example). thus, the file neds to persist as the user is previewing and improving his node.

c960657’s picture

I thought so too. But the upload module appears to save uploaded files directly in file_directory_path (using file_save_upload() in upload_node_form_submit()). The fact that the node hasn't been saved yet is represented using the FILE_STATUS_TEMPORARY flag in the "files" table. When the node is saved, the flag is changed to FILE_STATUS_PERMANENT using file_set_status().

The remaining places in core that use file_save_upload() either discard the temporary file after having read it (includes/locale.inc) or copy it to another location (modules/system/system.admin.inc and modules/user/user.module).

But perhaps I missed something when grep'ing around ... ?

dopry’s picture

no file_directory_temp is no longer used for file uploads or storing previews. The description on the file system settings page is incorrect.

I conceptually like stream wrappers and the idea of using them. However, I don't see their utility with this implementation.

  • url generation is not addressed
  • access control is not addressed
  • error handling is not addressed.
  • hook_file can cover every use case mentioned except S3 storage.
  • FUSE or JungleDisk can do the same without having to hack realpath.
    for FUSE: http://code.google.com/p/s3fs/wiki/FuseOverAmazon

I mean it would be totally cool to be able to write fileapi as a stream wrapper, but really it's just another layer of abstraction in an already difficult and poorly understood API. It doesn't allow me to do anything I can't already do with more mature tools.

dopry’s picture

As a systems administrator I could never recommend this as a solution to cluster file distribution when there are file systems like OCFS and NFS freely avaliable, as well as various SANS options that are orders of magnitude more mature.

c960657’s picture

I am not sure what you miss WRT url generation, access control and error handling?

The URLs generated by file_create_url() work fine as long as you use private files. Errors generated by custom stream wrappers are handled the same way in PHP as errors from the local filesystem.

The hook_file stuff and s3fs look interesting, though.

dopry’s picture

@c960657:

re: file_create_url with private files only... so in the S3 use case it has to be read from the remote service and served by drupal... so the file has to move across 2 networks before getting to the client, and the process serving the request has to sit there waiting reducing the number of clients your webserver can support. It would be so much nicer if files could be served directly from the remote service... then access control integration becomes an issue.

re: error handling... right now a lot of file handling happens with the assumption that underlying errors are permanent... ie) if a file can't be stat'ed it doesn't exist... however with stream handlers to a remote service this could be a temporary error.... The api layer above the stream handlers will need to be tolerant of these temporary errors that do not happen with a real filesystem. I'm not sure if it's really an issue, but something that should be considered maybe a new class of soft error for file operations.

I really like the concept of stream wrappers, but I think the implementation has to go deeper to be worth while. If we're going to embrace streams we might as well fully embrace streams for the filesystem... not just one particular stream for the entire FS and we also need to address URL generation and access control, even more so than the error handling in my opinion.

It seems like a much cleaner and more native way of implementing the drivers for my mountpoints / fileapi concept. I already have the mount point management and path parsing layers written.. I haven't revisited it since late Drupal 4.7 or early D5 development so who knows what condition it is in...

I'll think more about and keep playing devil's advocate.... I know I already have my ideas and long term vision, alternatives are welcome... I'm usually pretty sleep deprived when I hop on here.

c960657’s picture

>so in the S3 use case it has to be read from the remote service and served by drupal...
>so the file has to move across 2 networks before getting to the client,
Yes, the first time, until the webserver has a copy in its local cache.

>It would be so much nicer if files could be served directly from the remote service...
One way of doing this is using custom_url_rewrite_outbound(). I don't have experience with that yet, though (the possibility of rewriting the outgoing hostname was added in Drupal 6, and our current setup was made before that).

re: error handling:
I don't know much about NFS, but I assume any networked storage may have temporary errors.

>It seems like a much cleaner and more native way of implementing the drivers for my mountpoints
>/ fileapi concept.
I tend to agree. However, using a custom-written stream wrapper, you have greater flexibility WRT to what happens when a file is written, moved etc., and you can write your own local rules in PHP rather than writing your own FUSE wrapper - a thing that most PHP programmers probably aren't familiar with.

>I'll think more about and keep playing devil's advocate....
I appreciate your comments.

We initially chose to use stream wrappers because it seemed like a non-intrusive and non-system-administration-intensive way of supporting multiple servers in a cluster. I am not sure it was the best solution, but at least it works, though not as fast as we would like during uploads (perhaps partly because we also map file_directory_temp to our S3 drive because of the current description as mentioned in #18). Also, the work connected with keeping contrib modules stream wrapper-aware turned out to be bigger than initially expected (though, if stream wrapper support is made official in Drupal core, we can push our module changes upstream).

dopry’s picture

re: pulling from s3...
it's a cost and I understand caching locally, but it's a big waste of bandwidth, and as you noted the latency issues can be killer. thats why I halted prior S3 work the latency was proving to be a bigger challenge than I anticipated.

re: generating URL's.
if file_create_url were stream aware or had a way of hooking into URL handlers for different streams we could probably solve this in core. just a thought...

re: NFS... not nearly as frequently as S3 or as a result of latency, and it's all kind of fubar on a site when NFS goes awol in a cluster... like ugly bad :) like no document root bad...

re: fileapi/stream wrappers... so I ripped the mountpoint system from fileapi and put it in my sandbox
http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/dopry/mount... (D5, should be trivial to port to D7) with a completely untested modification to use stream wrappers instead of the original driver system I was developing for file api... It would be completely dependent on the stream wrappers in core thing, but I see a good match in my head. The contents of mountpoint.module would probably move into file.inc and mountpoint_ui would probably be headed to system.module. With tokenized paths and a smarter file_create_url it could be a fun mix.

re: you'd probably be happier with OCFS or NFS, the administrative overhead is actually minimal if you're already running a cluster, but if it gets the job done... I do still have reservations about file manipulations
and the like... (imagemagick, GD, etc)... It seems like going this route we will be copy files to local tmp storage pretty often until the rest of php's coolness becomes stream aware... I should probably look into that since It will affect a lot of the development work I do.

/me mutters about getting all excited about things he didn't want to get involved in and puts his grumpy hat back in the closet for a bit.

mikl’s picture

Well, since my esteemed colleague, c960657, is in Riga on vacation, I'd just like to say thanks for the input, dopry. It has certainly given us something to think about.

As c960657 mentioned, getting all this to work with contrib-modules is quite painful – I spend the greater part of wednesday this week trying to upgrade image(api|cache|field) to latest stable (since we have been running -dev releases for a while), and that wasn't all that fun.

In any case, we're not dead set on S3. In fact, http://drupal.org/project/cdn looks promising, but that also involves patching the core :(

c960657’s picture

FileSize
24.68 KB

Though NFS (or equivalent services) may be one solution for handling files in a cluster, I think the stream wrapper approach is still worth considering. In particular, I think it is convenient to be able to write your own logic in PHP that suits your particular needs.

NFS also has its issues, so I think it would be useful to have a choice between several different approaches.

I have looked at hook_file, and I fail to see how it covers the same use cases as a stream wrapper. I can see that you are able to hook into file load and save operations, but AFAICT you cannot e.g. use file_scan_directory() to enumerate files not present on the local machine.

Updated patch:

  • Only file_directory_path has stream wrapper support, i.e. we can still use tempnam(), imagegif() etc. on files in file_directory_temp.
  • The outdated description for the file system settings page is updated.
  • New utility function: file_is_wrapper() - useful for modules using native functions without stream wrapper support
  • image_gd_close() now supports stream wrappers
dopry’s picture

with hook_file but you could guarantee changes across the cluster on hook_file events... no you couldn't use file scan directory in the same way via hook_file, but any FUSE mounted file system could be used the same way.

c960657’s picture

FileSize
24.69 KB

Chasing HEAD.

c960657’s picture

FileSize
24.72 KB

Chasing HEAD.

lilou’s picture

Status: Needs review » Needs work

Need to be re-rolled :

file.test must be moved to modules/simpletest/test/

c960657’s picture

FileSize
29 KB

Reroll. Still work in progress.

In order to test the patch, I added the following to my settings.php in order to be able to run all existing tests with file_directory_path pointing to a stream wrapper. :

require_once $_SERVER['DOCUMENT_ROOT'] . '/modules/simpletest/tests/file.dummystreamwrapper.inc';
stream_wrapper_register('dummy-stream-wrapper', 'FileDummyStreamWrapper');
$conf['file_directory_path'] = 'dummy-stream-wrapper://sites/example.org/files';

This is probably not the way to go, but at this stage it is probably a good way to ensure thorough testing. I should work on making a regular testcase that test various modules that use files.

There is a failed test here and there, but generally it doesn't look too bad.

drewish’s picture

subscribe.

i'm interested in reviewing this further but there's some obvious things that need cleanup. the phpdoc doesn't follow the coding standards and the naming of things seems odd, file_realpath should probably be drupal_realpath, etc.

c960657’s picture

FileSize
42.67 KB

The patch is more or less complete, but lacks tests. No ready for review yet.

Instead of the addition to settings.php mentioned in comment #33, I now hacked DrupalWebTestCase::setUp() to set file_directory_path to a path with a wrapper prefix, and file_downloads to FILE_DOWNLOADS_PRIVATE. At the top of boostrap.inc I added code to always register the dummy stream wrapper with PHP. These two changes are included in the patch but are not intended to be committed.

With these two non-standard settings, all existing tests passes except two: 5722 passes, 2 fails, 0 exceptions

Without the two non-standard settings, all existing tests passes.

All tests have been run with the patches from #299290: User tests exceptions and #310113: SimpleTest: assertFieldByXPath() doesn't work in some environments, and with the change from #298309: DatabaseStatement::execute with PDO::FETCH_CLASS calls __construct() after setting object attributes reverted.

I had to change some of the tests to make the compatiable with stream wrappers and/or private files. This may not be necessary, because the tests are expected to run from a default configuration, i.e. without stream wrappers and with public files. But on the other hand it may be convenient to be able to run all tests with these non-standard settings like I have been doing.

c960657’s picture

FileSize
41.11 KB

Chasing HEAD.

c960657’s picture

FileSize
35.26 KB

Chasing HEAD (following the big hook_file commit).

aaron’s picture

spoke w/ dopry & drewish in irc. looking at this with an eye of how emfield could benefit from the patch. i'll try to weigh in over the weekend.

c960657’s picture

FileSize
16.16 KB

Chasing HEAD.

c960657’s picture

FileSize
42.8 KB

Chasing HEAD.

(please ignore the patch in comment #39 - it belongs to another issue)

c960657’s picture

FileSize
50.67 KB

I have added tests for the new functions, drupal_realpath() and file_is_wrapper().

I also added a few litmus test to verify that some of the basic functions are working as expected when file_directory_path is a path with a stream wrapper prefix. Optimally the testbed robot should run the complete testsuite (all 6000 tests or so) with file_directory_path pointing to a directory with a stream wrapper prefix (the same applies for other configuration variables that affect many tests like clean_url's enabled/disabled, $db_prefix empty/non-empty etc. as discussed in comments #5 and #6 in #239542: Daily test runs). This is now possible, because setUp() preserves the existing values of file_directory_path and file_downloads like it currently does for clean_url.

If you want to run the complete suite using the dummy stream wrapper, add the following to settings.php:

require_once DRUPAL_ROOT . '/modules/simpletest/tests/file_dummy_stream_wrapper.inc';
stream_wrapper_register('dummy', 'FileDummyStreamWrapper');

Then go to http://example.org/admin/settings/file-system and change File system path from e.g. sites/example.org/files to dummy://sites/example.org/files, and set the download method to Private files. Then run the tests.

When the original file_directory_path is a regular file, all tests pass on Linux on Windows (except those that fail without the patch). When the original file_directory_path is a path with a stream wrapper prefix, all tests except the two mentioned in comment #35 pass on Linux (still waiting for the results from the Windows box).

c960657’s picture

FileSize
51.59 KB

Reroll.

c960657’s picture

Status: Needs work » Needs review

I think this works good enough to get some review.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

FileSize
50.79 KB

It looks like the testbot was complaining about streamwrapper-D7-15.patch, not streamwrapper-D7-18.patch. Here is a fresh reroll that passes all tests on my box.

c960657’s picture

Status: Needs work » Needs review
dopry’s picture

-    if (($mode & FILE_CREATE_DIRECTORY) && @mkdir($directory)) {
+    if (($mode & FILE_CREATE_DIRECTORY) && @mkdir($directory, 0775)) {

The following chmod will become useless with your edit. It is also scope creep for what you are implementing in this patch. Either leave it alone or update it the logic around directory creation and error reporting.

     // Set standard file permissions for webserver-generated files
-    @chmod(realpath($image), 0664);
+    @chmod($image, 0664);

Why is real path removed above? It's also removed from several places in the Tests is this purposeful? Should it have been changed to drupal_realpath?

   function assertFilePermissions($filepath, $expected_mode, $message = NULL) {
+    // File permissions are not supported by stream wrappers
+    if (file_is_wrapper($filepath)) {
+      return;
+    }

Should we be somehow checking permissions on the stream wrapped files? It seems naieve to just skip the permissions check because it's stream wrapped. Can we require stream wrappers used with drupal to implement stat?

c960657’s picture

The following chmod will become useless with your edit.

True. I have now removed the chmod(). The reason for setting the permissions with mkdir() is that chmod() does not work with stream wrappers.

Why is real path removed above? It's also removed from several places in the Tests is this purposeful? Should it have been changed to drupal_realpath?

Filesystem functions automatically resolves relative file references. Without stream wrappers, adding an extra realpath() is redundant. Adding an extra drupal_realpath() is redundant, with or without stream wrappering. Some functions give slightly different error messages if the specified file does not exist, but that is the only difference AFAIK.

Should we be somehow checking permissions on the stream wrapped files? It seems naieve to just skip the permissions check because it's stream wrapped.

The thing is that chmod() does not support stream wrappers. For some reason it is only possible to read stat information on stream-wrapped files, but not to modify it. Hopefully this will be fixed in future versions of PHP. The only exception (quite strange) is that you can specify permissions on new directories with mkdir().

Calling chmod() on a stream-wrapped file and then testing the outcome using fileperms() will always fail (unless the permissions already had the requested value). If Drupal's SimpleTest supported a "skip" result in addition to "pass" and "fail" like e.g. PHPUnit, this would have been the proper status to return. Alternatively we could just let the tests fail, but I think it is better to omit them so they don't remove people's attention from other failures.

As long as all files are to be saved with the same permissions, people could make their stream wrapper implementation apply default permissions on all new files. The default permissions could be hardcoded inside the stream wrapper or set in the wrapper's default stream context e.g. in settings.php (a similar approach would probably be required, if the files are saved on e.g. S3, NTFS or other filesystems whose permission models are much different from the one supported on Linux and used my chmod/chown).

c960657’s picture

FileSize
49.26 KB
dopry’s picture

Status: Needs review » Needs work

The directory creation logic is still incorrect.. failed mkdir's will not report an error...

c960657’s picture

Status: Needs work » Needs review
FileSize
49.31 KB

D'oh! Sorry. I hope I got it right this time.

naxoc’s picture

subscribe

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
40.88 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
48.4 KB

Sorry, the patch was missing file_dummy_stream_wrapper.inc (I wonder why this only caused two exceptions?)

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

FileSize
48.46 KB

Reroll.

c960657’s picture

Status: Needs work » Needs review
chx’s picture

Priority: Normal » Critical
Status: Needs review » Needs work

I would use

function file_is_wrapper($path) {
  // A wrapper prefix is at least two characters so that it can be distinguised
  // from a Windows drive letter, "C:/temp"?.
  preg_match('@^(?<wrapper>[a-z0-9.+-]{2,})://@i', $path, $matches);
  return $matches ? $matches['wrapper'] : '';
}

As empty strings casted to bool are FALSE this allows you to use it as you used in the patch but also to reuse it whenever you need the name wrapper as well.

Also, I think you got the importance of this feature wrong. Fixed :)

drewish’s picture

Finally had some time to dig into this, I think this would be a very cool feature. Here are my comments:

  1. file_directory_temp() and drupal_realpath() have bad PHPDoc block. The first paragraph should be a single sentence describing the function. Additional details should be in a subsequent paragraph.
  2. I don't think FileStreamWrapperTest::setUp() should be calling file_test_reset();
  3. FileStreamWrapperTest::testFileFunctions() is doing way to much in one function. It needs to be broken up into several smaller test functions. Actually that goes for all of FileStreamWrapperTest... I'd almost rather see you adding additional test cases to the existing test classes that specifically used stream objects.
  4. FileDummyStreamWrapper's PHPDoc block should start with /** (the second star is important) and has the same problem as file_directory_temp() in that it needs to start with a single sentence description.
  5. I'm not so sure about the file_test_init().. can't we do a drupal_include there?
  6. In image_gd_close() it's nit picky but $ok seems like a bad variable name. I'd look through core and see what else is used for holding a return code like that.
chx’s picture

I am looking at the

-    @chmod(realpath($image), 0664);
+   @chmod($image, 0664);

and similar and wonder "is this secure"?

c960657’s picture

FileSize
51.26 KB

This addresses the points raised in #60 and #61.

I discussed how to write tests with drewish on IRC. Now tests for stream wrapper support in some function, foo(), is added to the class that tests that function. So in addition to testFoo(), another function testFooWithStreamWrapper() is added. See the patch for a few examples. This is still work in progress.

Since the stream wrapper tests are now no longer limited to the file tests, I added drupal_web_test_case::useStreamWrapper() instead of FileStreamWrapperTest::registerStreamWrapper() (from the previous patch). I moved file_dummy_stream_wrapper.inc into /includes in order to be able to autoload it (if it is part of the SimpleTest module it isn't available for autoload when invoked from the SimpleTest browser AFAICT).

Also, the dummy wrapper prefix is not registered when invoked from the SimpleTest browser. I fixed this by adding a temporary hack to file_directory_path(). This is only necessary until #329177: Get rid of ugly simpletest regexp. lands.

I also discussed #62 with chx. He was worried that removing realpath() would introduce security exploits. I don't think it will, but please consider this carefully when reviewing the patch.

c960657’s picture

Status: Needs work » Needs review
FileSize
76.46 KB

Added more tests. I think all file-related tests now have a stream wrapper counterpart.

I modified some of the existing tests to use the files supplied by drupal_web_test_case::drupalGetTestFiles() rather than misc/druplicon.png etc. so they can test stream wrappers just by setting file_directory_path() to a path with stream wrapper prefix prior to running the test.

The hack in file_directory_path() mentioned in #63 has been moved to _drupal_bootstrap(), but this is still just a temporary measure until #329177: Get rid of ugly simpletest regexp. has been fixed.

Don't be alarmed about the size of the patch. The actual change to core is rather limited. The majority is test-related.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

I cannot reproduce the test failure reported by the testing bot. Any ideas?

chx’s picture

Status: Needs work » Needs review

There has been fixes. I request a re-test.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
76.47 KB

Found the problem.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review

Reroll.

c960657’s picture

FileSize
69.02 KB
c960657’s picture

Status: Needs review » Needs work
drewish’s picture

Status: Needs work » Needs review

c960657, at one point (on irc?) you'd said you've got an s3 stream wrapper. would you mind posting that? i'd love to actually test this with a "real" wrapper.

also i thought you were going to move the test stream wrapper init code out of _drupal_bootstrap()?

drewish’s picture

Status: Needs review » Needs work

i guess i cross posted you.

c960657’s picture

FileSize
76.74 KB

We use the stream wrapper in the Services_Amazon_S3 PEAR package (maintained by me).

In addition we use two cache layers: one local file cache on each web server, and a shared cache in the database for stat data (so that if a file is e.g. deleted on one server, the other servers know that their cached copy is stale). Our file_directory_path is filecache://dbstatcache://s3://bucketname/files. It may look insane, but the idea is to separate the individual layers and create some general-purpose wrappers that can be submitted to e.g. PEAR, though I never got around to doing that (but if you prefer, you can of course combine S3 access and caching in one wrapper). We have combined all this in a custom module (currently not compatible with D7).

also i thought you were going to move the test stream wrapper init code out of _drupal_bootstrap()?

I am waiting for #329177: Get rid of ugly simpletest regexp. to land. For now I have moved it to the bottom conf_init(), the same place as where #329177 hooks.

c960657’s picture

Status: Needs work » Needs review
drewish’s picture

Status: Needs review » Needs work
FileSize
52.98 KB

the patch doesn't apply. i started to re-roll it but got bogged down and ran out of time fixing file.test. the thing that still bothers me about this patch is that it adds in an additional case to every function... really wish we could do everything as a stream or none of it because it just seems like there's bound to be bugs lurking in there.

that said i went a head a refactored the tests so that the stream tests extend each test case and the setup function calls $this->useStreamWrapper(); this way every test is run for both version. it's less efficient but at least we can be sure that both paths work correctly.

i'd merged the tests and in the process of creating a patch managed to delete all the changes but had a copy of file.test kicking around so i'm attaching that.

drewish’s picture

Issue tags: +D7FileAPIWishlist
c960657’s picture

FileSize
69.61 KB

Reroll based on attachment to #78.

really wish we could do everything as a stream or none of it because it just seems like there's bound to be bugs lurking in there.

I.e. make file_directory_path() always return a wrapper prefix, possibly file://? I guess that is feasible. Note that file_unmanaged_* would still have to support regular filesystem paths in order to work with e.g. file_directory_temp.

drewish’s picture

cool, i'll try to give this a close review tonite.

c960657’s picture

Status: Needs work » Needs review
FileSize
67.7 KB
dopry’s picture

I'd like to note ongoing work as a part of the media sprint is stream wrapper support. The current work can be found in contributions/modules/stream. It includes a public:// and private:// stream wrappers by default and a basic set of tests.

http://www.drupal.org/project/media
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/media/resource

The work is much more invasive than this patch so I am implementing it in contrib with a replacement for the current file api that leverages the stream wrapper support more fully. It is a bottom up OO rewrite of how non-database local and remote resources are managed.

Status: Needs review » Needs work

The last submitted patch failed testing.

andreiashu’s picture

great post !!! subscribing

c960657’s picture

Status: Needs work » Needs review
FileSize
61.43 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
67.74 KB

Patch missed a file.

kwinters’s picture

Paths like "../logs/error_log" could potentially be sensitive and don't contains "/../" (but a path that ended with "/.." would always be a directory, not sure if that's OK in this context, rm -rf would be bad).

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
68 KB
jhedstrom’s picture

Subscribing.

Status: Needs review » Needs work

The last submitted patch failed testing.

c960657’s picture

Status: Needs work » Needs review
FileSize
68.34 KB

Reroll.

gordon’s picture

subscribe

Status: Needs review » Needs work

The last submitted patch failed testing.

ghoti’s picture

<aol>me too!</aol>

bjaspan’s picture

subscribe

drewish’s picture

just a note that #203204: Uploaded files have the permissions set to 600 was committed which adds a drupal_chmod().

c960657’s picture

FileSize
67.28 KB

Reroll.

c960657’s picture

FileSize
65.86 KB

Reroll.

eojthebrave’s picture

This seems like it has a lot of potential. Subscribing, and after a browsing through the patch real quick here are a couple of things I saw.

+ * Get the stream wrapper prefix from of path.
+ *
+ * @param $path
+ *   A string containing a path to a file or directory.
+ * @return
+ *   The wrapper prefix, e.g. "foo" for the path "foo://bar.txt", or FALSE is
+ *   $path is a regular filesystem path.
+ */

contains a few typos.

Should be.

+ * Get the stream wrapper prefix of a path.
+ *
+ * @param $path
+ *   A string containing a path to a file or directory.
+ * @return
+ *   The wrapper prefix, e.g. "foo" for the path "foo://bar.txt", or FALSE if
+ *   $path is a regular filesystem path.
+ */

There is a weird character in this line.

+ 'name' => t('File paths and directories – with stream wrappers'),

Not sure what it is supposed to be.

Looks like the weird character is also here

+ 'name' => t('Unmanaged file delete – with stream wrappers'),

and here

+ 'name' => t('Unmanaged file copying –with stream wrappers'),

and here

+ 'name' => t('File saving – with stream wrappers'),

and here

+ 'name' => t('Realpath resolving – with stream wrappers'),

c960657’s picture

Status: Needs work » Needs review
FileSize
66.33 KB

Reroll with typos fixed and weird characters removed (the weird character was actually an en dash, but a regular hyphen is probably better after all).

Status: Needs review » Needs work

The last submitted patch failed testing.

webchick’s picture

Status: Needs work » Needs review

HEAD broke.

jmstacey’s picture

Subscribing.

aaron’s picture

I plan to jump back into this issue, hopefully this week.

First off, I'd like to note that dopry's work at http://drupal.org/project/media (with the stream wrappers in its included Resource module) has been invaluable. Anyone interested in seeing PHP stream wrappers in core should take a look at that.

I plan to take a look at the current status again. I imagine it's killing a lot of puppies, but I see no way around that to at least get the initial jump start needed. However, once we have the basic engine in place, we'll be able to do some really exciting things.

aaron’s picture

FYI, I'm mentoring Jon Stacey (jmstacey), a student for GSOC, who's developing the Media module. We're changing tracks of that project slightly, to develop in d7 rather than d6, because there are things in core that will make the project easier to bring to fruition.

We'd like to see more, including this patch. However, the terms of a Drupal project w/ GSOC are that it can't be dependent on making commits in core.

That said, we've decided to focus the next week or two on core patches that would help things along. Specifically this patch, and probably another patch after this to make related changes to hook_file that take stream wrappers into consideration.

Jon's getting up to speed on this patch, as well as myself (I haven't looked at it in a few months).

I'm not asking anyone to vet the work or anything, and am definitely not asking that we speed up the process of adoption -- obviously we want to ensure the best work possible goes forward, and don't want to make any mistakes by committing something that's not quite ready.

However, I would be eternally grateful if interested folks could give a few minutes later in helping to review things, and help us find blocking issues.

Thanks,
Aaron Winborn

drewish’s picture

i think there's a few issues that this ties in with that all need to be resolved at once. the thing i don't like about this patch is that it lets us use normal paths or stream wrapped paths. as i've said before i think we should pick one or the other. if say that we're only going to allow stream wrapped paths we've got an approach that could help solve several issues: #366464: File table includes file_directory_path() #166759: Public/Private File Handling #329301: Rename {files} to {file} and add unique key to the filepath column. i think that dopry's work on the resources module had some very promising work started. he had separate stream wrappers for public, private and temp files. i think we need to have separate directories for each and which would let us keep the full path out of the database table and still allow us to finally get a unique key on the filepath column.

Anonymous’s picture

jmstacey’s picture

Aaron, drewish, and I had a chat this morning and the three of us have agreed to a tentative plan that I'll try to describe with our reasoning below.

The bottom line is that we need a unified resource handling system. We have an opportunity now to take care of several issues in one swoop. Ultimately, we envision core providing three basic stream wrappers: public://, private://, and temp://. This alone will bring additional functionality to core such as the ability to use both public and private filesystems at the same time. Additionally, this paves the way towards allowing contrib modules to provide extended functionality (youtube://, s3:// ...), and yet be built on a common base.

As drewish previously mentioned, this would take care of a couple issues we're facing:

#166759: Public/Private File Handling - This would be bundled up neatly once stream wrappers are implemented. The registered stream wrapper (public, private, temp,...) simply looks up the configured path based on the URI.

#366464: File table includes file_directory_path() - This issue can be cleaned up because the wrapper would determine the path. For example, the resource public://friends/tom/london.jpg would be handed off to the public stream wrapper which would consult the configuration settings and look in, for example, sites/default/files/friends/tom/london.jpg. A resource private://family/jennifer/wedding.jpg would be handed off to the private stream wrapper and look in say /home/bob/private_files/family/jennifer/wedding.jpg.

To accomplish this goal we need to concurrently tackle several areas:

We need to merge this patch with the work done by dopry on resource.module (currently part of Media).

#329301: Rename {files} to {file} and add unique key to the filepath column - We propose the following changes:

  • New table {file}. Identical to {files} except for the changes below.
  • 'filepath' renamed to 'uri'. Contains full URI (with wrapper). As files are migrated, their paths will be trimmed and the appropriate wrapper prepended (public://example_file.txt).
  • Index uri column. Improved query performance.
  • Remove 'status' column. The relevant wrapper (public, temp, etc.) will determine this. [pending full evaluation]

Upgrade path: We propose leaving the files table in place until all modules have migrated. We would provide helper functions to assist in migration. For example, given an old filepath return a new D7 file object that can be passed on to the appropriate storage mechanism. As a result of the proposed database schema changes we would need to provide patches for core modules that rely on the files table. At this time we are only aware of upload.module and filefield needing attention, but we will need a comprehensive evaluation.

How you can help
Getting this done will lay the groundwork for a lot of incredible new features, all built on a common foundation. However, there is a lot to accomplish and the more eyes the better. I'll be working on this full-time for the next couple of weeks. We need help from every area so drop by #drupal (irc) and ping one of us if you have the time.

Anonymous’s picture

jmstacey: awesome summary.

i'm beejeebus in IRC, i'll be online saturday and sunday, see you then.

Status: Needs review » Needs work

The last submitted patch failed testing.

jmstacey’s picture

Issue tags: +gsoc, +gsoc2009, +gsoc2009-jmstacey

Just a small update. I've created a fork of mikl's GitHub drupal mirror for use in rapid collaborative work on the patch: http://github.com/jmstacey/drupal/tree/master

At this point the fork consists of streamwrapper-D7-38.patch, merged to drupal head, and the stream wrapper interface and registry implementation from dropyr's work on resource.module (+coding standards and doc improvements). Tomorrow I'll be fixing up tests to make sure that basic stream wrapper support actually works, and then I'll start working on file API.

Word on the street is that there will be a mini-sprint this weekend with drewish and justinrandell. If you're interested in helping out, ping me and I'll provide commit access to the development branch.

jmstacey’s picture

Here's a checkpoint so that cursory reviews may begin. There may be regressions from streamwrapper-D7-38.patch because of a different design.

Changes so far:

  • Stream wrapper registry. Provides a way of determining which wrapper should handle a request for functions that go beyond PHP's built-in stream wrapper support (e.g. drupal_chmod). Original work from dopry.
  • Three core stream wrappers added (public, private, and temp). Original work from dopry.
  • Unit testing of wrapper registry register, unregister, and availability of core wrappers.
  • drupal_realpath() now hands responsibility to the registered wrapper if a valid scheme is detected. Fallback to PHP's realpath if URI does not contain a registered scheme. Core refactored.
  • drupal_chmod() now hands responsibility to the registered wrapper. Fallback to PHP chmod() if URI does not contain a registered scheme. Core refactored.
  • Helper functions to return a URI scheme and an appropriate wrapper instance (stream_scheme(), stream_wrapper() respectively).

Other notes:

  • The administrative interface and variables to support the three stream wrappers are not implemented.
  • drupal_tempnam() added, but uncertain implementation. Core not refactored.
  • Stream wrapper unit tests pending File Interface changes.
  • This patch is still a work in progress

Coming up:

  • Database changes and upgrade path
  • File interface overhaul
  • Core module changes
Anonymous’s picture

FileSize
34.15 KB

updated patch is just a rearrangement of the patch at #115 so it can be applied to a clean checkout of HEAD with -p0 - no code changes.

Anonymous’s picture

@jmstacey: i've got a git clone of your github drupal mirror, ping me in #drupal when you get a chance. i'm a git n00b, so i'll need some pointers re. pushing changes back up.

currently just trying to get basic user picture and upload module functionality to use the stream wrappers and see what works/doesn't work.

drewish’s picture

Here's a summary/export of what we got done today. If you'd like to see it in more gory detail check out jmstacey's drupal github repo.

drewish:
- Removing all the trailing white space.
- Reorganizing the files that the streamwrapper classes are in to slim it down to two files (includes/stream_wrapper_registry.inc and includes/stream_wrappers.inc)
- Renaming all the class names to be CamelCased rather than underscore_separated.
- Tried to un-rat-fuck the whole Drupal{Public,Private,Temp}StreamWrapper inheritance tree.

justin:
- convert DrupalStreamWrapperRegistry to a static class rather than a singleton, and updated the callers
- moved stream_scheme() and stream_wrapper() into DrupalStreamWrapperRegistry
- renamed all of the underscore-named functions to CamelCase

tic2000’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

jmstacey’s picture

Quite a bit of work has been done since that patch was generated, especially with the File API. The latest checkout has all file related tests passing (last I checked). The fork needs to be merged, retested locally, and then a working patch can be submitted for review. While all the tests are working, they are not comprehensive in the context of the additional and changed functionality and could use some more work. However, initial review may commence.

I'll try to slip this in tomorrow if I have a little time between sessions at DrupalCamp Colorado.

Anonymous’s picture

nice work. i've just pushed up a change to move the stream wrapper initialisation in to a bootstrap phase.

aaron’s picture

Folks should take a look at #499156: CDN integration: allow file URLs to be rewritten by hook_file_url_alter(), as a) it's going to break our patch if committed first, which b) adds bloat to this patch, since we'd have to rewrite that patch, and c) it's on the fast-track for committing, as it's tagged Favorite-of-Dries.

aaron’s picture

Also, FYI, we're on track for getting this RTBC tomorrow, as we'll have a bunch of folks helping to review at the Media Sprint. See you in the morning!

jmstacey’s picture

Status: Needs work » Needs review
FileSize
211.23 KB

I've attached the latest diff so that the testing servers can have a go at it. The file and user profile tests are passing locally. We'll have our work cut out for us to get it RTBC but I think we can pull it off. Here's a quick list of things off the top of my head, I'm sure there are more:

  • Evaluation of mixed stream wrapper and normal path support in the API. At the moment there's an odd mixture of some functions that only support stream wrappers (those that will in some way interact with the DB), those that support both (interacting directly with local files), and those that only accept a regular path (some file_unmanaged_*). I attempted to document this compatibility in each relevant function.
  • Create documentation (introduce stream wrappers) and update the function docs. The move to stream wrappers is a bit of a fundamental shift in thinking about files. I've added a placeholder @see for a link to an introduction to stream wrappers on nearly every function that deals with the streams. I think this will go a long ways towards reducing the amount of confusion. Stream wrappers are essentially the equivalent to creating your own filesystem implementation using FUSE.
  • There's explicit checking for streams because of path separator issues. There's probably a more elegant solution such as extending DrupalStreamWrapperRegistry::normalizeUri()
  • Handling of "default" type (public/private). At the moment there is an assumption that public will always be used as the default wrapper. It would be prudent to remove the configuration options for this setting, or allow any registered wrapper to be selected as a default, but public would always be the safe fallback.
  • Test case/coverage improvement. The tests do not (and did not originally) provide full case and code coverage, particularly when it comes to storing files in subdirectories of the basepath.
  • Temp://. At the moment the actual use of this wrapper is uncertain. It may simply serve as a way of determining what temporary path to use for use with tempnam. See the note below about DB changes.
  • DB conversion. We need to get the upgrade path from D6 working and tested. Most of the work is done, I think we just need to implement the duplicate file handling as was suggested on #329301: Rename {files} to {file} and add unique key to the filepath column.

DB Changes
The aforementioned DB changes in #111 have been made with the exception of the removal of the 'status' column. The use case at hand is that you have a temporary image that you would like to make publicly available (e.g. advanced caching). temp:// in its current incarnation returns the system temp path (a replacement for file_directory_temp()). Temp can have many different meanings. I'd rather not require a new stream wrapper for a use-case like this. For the time being I've retained the status column for the continued indication of permanent/temporary files rather than move towards something messy like public://temp://myfile.jpg.

Status: Needs review » Needs work

The last submitted patch failed testing.

Wim Leers’s picture

After a skim through this issue, I have two questions:
- I'm wondering what performance toward S3 and so on would be like. Has anybody performed benchmarks already? Shouldn't stream wrappers to potentially high-latency remote servers not be used with care, e.g. only from within cron, to prevent extremely long page rendering times? (If PHP had threads and no limited execution time, this would not be a problem, or at least much less so).
- Doesn't the lack of chmod() support have major implications? But it does seem to be in the latest patch.

Very quick code style review:
- It's not "nineth" but "ninth".
- It's not "compatability" but "compatibility".
- Tabs instead of spaces on several occasions.
- Always use braces, never omit them (in single-line if/else tests).
- As far as I know we don't mention the type of a variable in doxygen in Drupal?

aaron’s picture

"As far as I know we don't mention the type of a variable in doxygen in Drupal?"

http://drupal.org/node/1354 does not, but there are places in core that do.

jmstacey’s picture

To answer your questions.
1. Stream wrappers to potentially high-latency remote servers will need to be used with care. Any implementations will need to implement caching for best performance. This is no different than writing an Amazon S3 backed filesystem using FUSE. I think most implementations won't have a problem until you get to the point of uploading an extremely large file. At this point, any performance enhancers would rely on the wrapper implementation to do stuff in the background and such.

2. drupal_chmod() supersedes chmod() and is backwards compatible with normal paths. For streams, drupal_chmod() will pass the job off to the appropriate wrapper. In the case of public, private, temp, this just interpolates the URI and calls normal chmod(). In other cases the wrapper might just ignore the chmod and do nothing (e.g. for S3). For normal paths, drupal_chmod will just fire off a regular chmod.

- The spelling corrections are fixed and I replaced any tabs the slipped in.
- I didn't notice any if/else short statements in my skim but I must just be overlooking them because I know I've seen them. Line numbers would help.
- As far as documentation, the Doxygen page doesn't mention using it, however I find it very useful because the type is not always mentioned in the parameter description. You immediately know what's expected or to be returned without looking at the code. Some current core functions such as the existing drupal_chmod add the variable type. We should make a decisions and update the Doxygen page to specify the standard we'll use. The coding standards page also needs to be updated to reference the alternate camelCase use with SimpleTests.

jmstacey’s picture

Status: Needs work » Needs review
FileSize
210.91 KB

Just the minor modifications I mentioned before.

aaron’s picture

Status: Needs review » Needs work

potential workflow for s3 using stream wrappers:

* user uploads a file through the s3 wrapper
* s3 wrapper saves the file locally and marks it to be moved on the next cron
* later, someone wants the URL for that file
* if the file has not yet been moved, the local url is returned
* else if the admin checkbox for using s3 is turned off, the local url is returned
* else if the server has otherwise determined that s3 is down, the local url is returned
* else the remote s3 url is returned

performance should be optimal in this setup

aaron’s picture

Status: Needs work » Needs review

sorry -- will this rerun the test?

Status: Needs review » Needs work

The last submitted patch failed testing.

aaron’s picture

Fatal error: Class 'DrupalStreamWrapperRegistry' not found in /var/www/d7-stream/includes/file.inc on line 1615 when you run install.php. i'll look at that right now.

tic2000’s picture

+  // Try created the .htaccess file if it's missing from the public, private, or temp directories.
Anonymous’s picture

at #134 - that's because we use the files in the installer, but don't initialise the Drupal's stream wrappers. working on patch...

aaron’s picture

I just committed a refactoring of file_get_mimetype in git, so that it now works through the schema/uri (allowing us to set custom mimetypes for youtube videos, etc).

aaron’s picture

added constants for SCHEME_PUBLIC, SCHEME_PRIVATE, SCHEME_TEMP

aaron’s picture

FileSize
216.52 KB

this latest patch is up to #138, and also includes a new test for SimpleTest, as we now add a simpletest:// stream wrapper for testing.

RobLoach’s picture

Wow, this is awesome... Subscribing.

node:// WHAT?!

jmstacey’s picture

Current known issues:

  • Drupal installation bombs. This is caused because stream wrappers are not initialized in cases where file.inc are included outside of a bootstrap.
  • Upload tests (and possibly others) are failing. These are test issues that will be resolved with the completion of the simpletest:// stream wrapper and its integration with all file related tests.
jmstacey’s picture

FileSize
225.99 KB

Here's the latest patch. This resolves the previously known issues, and some other unexpected breakages.

Here's the most recent list of things that need work:

  • DB upgrade path from Drupal 6. Partially implemented, see todo in upload_update_7000().
  • D.O. Documentation introducing stream wrappers (see notes in #125). Search and Replace @see lines with real URL.
  • Abstract static method cleanup. Initially, there was some confusion about abstract static methods... they don't make sense [Source]. This resulted in some messy calls like demonstrated on http://drupalbin.com/10347. While static functions would be ideal, I don't know if this can be accomplished elegantly. I'll be happy with the current implementation if we only pass data once (when instantiating).
  • Completion of "default" type handling. Aaron started by implementing constants for the core stream wrappers. Hardcoding of wrappers needs to be refactored to use these constants which are declared at the top of stream_wrappers.php.
  • There are way too many assumptions about using the public wrapper. This ties in with the default type handling. I think we should provide a system default, but ultimately individual modules can determine their own behavior. Example: BlogAPI probably shouldn't assume that we're using a public filesystem.
  • Usage of normalizeUri(). Anything that has the possibility to be located in a subdirectory should use normalizeUri and assume a prepended slash. This takes care of cases such as profile images that can be configured by the user. Images might be stored in the root directory so normalizeUri will turn "public:///" into "public://" and the assumption of the forward slash will allow subdirectories such as "public://pictures" to work.
  • Test coverage, robustness, and cleanup--we can always use more. Test subdirectories, make use of simpletest://, test core stream wrapper implementations, etc.
  • Ongoing: Coding standards review and cleanup

Upcoming code sprint in Philadelphia, July 26-27.

The last submitted patch failed testing.

jmstacey’s picture

Status: Needs work » Needs review
FileSize
226.48 KB

This reroll fixes the file download test which was failing because the private file path doesn't exist on the testing slaves. This also includes cleanup of the abstract static issue and a couple other exceptions. Simpletest is reporting a failure locally, but not what test specifically, so I'm hoping the testing server will shed more light on that problem.

Status: Needs review » Needs work

The last submitted patch failed testing.

jmstacey’s picture

Status: Needs work » Needs review
FileSize
226.4 KB

Thanks to cwgordon7 this test should finally pass. We're going to look into breaking this down into smaller and more specialized patch sequences, so this may be one of the last giant patches.

Edit: Actually this won't pass because #515280: file_check_directory() should create recursively was just committed.

The last submitted patch failed testing.

jmstacey’s picture

On recommendation from cwgordon7 and drewish I have split this patch into two parts: stream wrappers (this issue), and #517814: File API Stream Wrapper Conversion. This patch will obviously need to be committed before the second.

Keep an eye on #395472: Plugin Manager in Core: Part 1 (backend) which may cause this patch to fail erroneously.

drewish’s picture

subscribing. whoops, wrong issue.

jmstacey’s picture

This patch will now need to be committed before #519392: File API Stream Wrapper Support (Part 1) followed by #517814: File API Stream Wrapper Conversion. For those following the GitHub repository there are now six branches:

Here are the most important as they build off of each other:

  1. stream_wrappers: Contains this patch. Merged with drupal (upstream/master).
  2. fileapi-1_wrappers: Introduction of stream wrappers to the File API. Pulls from stream_wrappers.
  3. fileapi-2_wrappers: Heavy lifting work on File API. Pulls from fileapi-1_wrappers
  4. master: culmination other branches. Pulls from fileapi-2_wrappers.

upstream/master: This is the Git mirror of Drupal CVS. You must add this branch yourself for each new clone using "git remote add upstream git://github.com/mikl/drupal.git"

styling_corrections: This branch contains random fixes that we noticed along the way but weren't directly affected by changes we're making. This branch should be diffed directly to upstream/master. An issue will be created for it down the road whenever someone has time.

robertDouglass’s picture

Subscribing. In awe of the persistent effort going on here. Keep it up.

drewish’s picture

Just did a quick scan to see what was left in this patch and noticed a few of things. I'm very happy with where you chose to split the patch up.

The comment in DrupalStreamWrapperRegistry::getStreamScheme() is cryptic. I'd suggest either expanding it to be proper sentences or dropping it.

Can we drop the blank first line from functions (i.e. DrupalStreamWrapperRegistry::getValidStreamScheme() and getInstanceByUri(), etc)? IMHO it just wastes space.

DrupalStreamWrapperInterface::getInternalUri()'s comment references realpath but ommits the ()s that help indicate it's a function.

"mime type" in comments should probably be MIME-type or MIME type. I'm not sure if core follows a standard.

"@return bool" I don't think our coding standards call for using the type hints.

"PHP's realpath does not..." again, realpath should have parens following it.

I hate the word interpolation in comments. Can we use something that people won't have to look up in the dictionary?

I don't think canonicalized is a real word. Lets just use "canonical, absolute path" instead.

Why does DrupalLocalStreamWrapper have private member variables? I think protected would be fine and give more flexibility to extending classes.

DrupalStreamWrapperInterface::getInternalUri() and getExternalUrl() have pretty sparse documentation. It'd be good to flush those out a bit. Perhaps with an example?

Why all the extra work in DrupalLocalStreamWrapper ::getMimeType()? Why can't we just call file_get_mimetype()? All the code looks copied and pasted directly from there.

I think the convention is to put @see tags below @return tags but in DrupalLocalStreamWrapper there's a lot of these in the reverse order. I'd go check the coding standards but I'm tired enough that odds are I wouldn't finish the review.

I'm wondering if we're going to have to conditionally define the Drupal{Private,Private}StreamWrapper classes depending on which file transfer mode is enabled...

jmstacey’s picture

I compartmentalized the changes so you can just glance at the specific changeset rather than sifting through everything again.

The comment in DrupalStreamWrapperRegistry::getStreamScheme() is cryptic. I'd suggest either expanding it to be proper sentences or dropping it.

[stream_wrappers 89fc86b]

Can we drop the blank first line from functions (i.e. DrupalStreamWrapperRegistry::getValidStreamScheme() and getInstanceByUri(), etc)? IMHO it just wastes space.

[stream_wrappers 971bd35] Personal preference for readability, but I think I got them all. What about classes though?

DrupalStreamWrapperInterface::getInternalUri()'s comment references realpath but ommits the ()s that help indicate it's a function.

"PHP's realpath does not..." again, realpath should have parens following it.

I hate the word interpolation in comments. Can we use something that people won't have to look up in the dictionary?

[stream_wrappers cbefe60]

"mime type" in comments should probably be MIME-type or MIME type. I'm not sure if core follows a standard.

[stream_wrappers 099d29e]

I don't think canonicalized is a real word. Lets just use "canonical, absolute path" instead.

[stream_wrappers cefdd692] I copied that sentence straight from PHP documentation on realpath(). I've changed as recommended.

Why does DrupalLocalStreamWrapper have private member variables? I think protected would be fine and give more flexibility to extending classes.

[stream_wrappers cdc625f5] changed to private, followed by [stream_wrappers 3b5f020c] which makes them public [we were using blind magic getters/setters, so we might as well open them up).

DrupalStreamWrapperInterface::getInternalUri() and getExternalUrl() have pretty sparse documentation. It'd be good to flush those out a bit. Perhaps with an example?

[stream_wrappers eb5f737] and [stream_wrappers 888e6b8] getInternalUri() will also contain the "@see drupal_realpath()" when the function is made available in #519392: File API Stream Wrapper Support (Part 1).

Why all the extra work in DrupalLocalStreamWrapper ::getMimeType()? Why can't we just call file_get_mimetype()? All the code looks copied and pasted directly from there.

This is indeed a copy/paste from file_get_mimetype(), but there is a very important reason. We can not simply call file_get_mimetype() with a stream because it doesn't know what to do with it. It's not something that PHP stream wrappers will handle transparently for us. We have to specifically tell it what wrapper implementation to use which is why we need a registry, so that we can follow the path backwards in a way that PHP does not natively handle. #517814: File API Stream Wrapper Conversion removes all of this code from file_get_mimetype and simply invokes the appropriate wrapper implementation. So there won't be any code duplication in this area when everything falls into place.

I think the convention is to put @see tags below @return tags but in DrupalLocalStreamWrapper there's a lot of these in the reverse order. I'd go check the coding standards but I'm tired enough that odds are I wouldn't finish the review.

[stream_wrappers 2c271b9]

"@return bool" I don't think our coding standards call for using the type hints.

[stream_wrappers 4f0f530] This will need to be brought up on the other patches as well because I did this throughout everything. This should be specifically put in the coding standards.

I'm wondering if we're going to have to conditionally define the Drupal{Private,Private}StreamWrapper classes depending on which file transfer mode is enabled...

Ideally all three wrappers will be registered and available simultaneously. At least that's the whole idea for stream wrappers. At this level of implementation it doesn't have any affect on existing Drupal functionality. Once we start get into #517814: File API Stream Wrapper Conversion we'll need to decide on a granularity. I'm in favor of just keeping a system-wide default that modules may use, but of course can override however they want. In the end, it won't matter what this default is set to because the appropriate wrapper will be determined by the URI. Se we can run public, private, (and more) setups side by side without stepping on each other.

I'm looking to bringing some of that work out of Part 2 so that users have a interface for changing the variables stream_public_path, stream_private_path, and stream_temp_path.

Here are some miscellaneous doc changes not specific to any issues raised: [stream_wrappers d504642]

Patch roll coming up in a few minutes...

jmstacey’s picture

getMimeType() is now a static method. This allows file_get_mimetype() to be backwards compatible and be moved from part 2 to part 1.

There is a reason that this is the only static method while the others are not, and I'll try to explain. First, read "Why does PHP 5.2 disallows abstract static classes?." The first problem is that getDirectoryPath() is not part of the wrapper interface, it is something specific to local Drupal wrappers, and so is part of the class not the interface. The second problem is that the implementation of getDirectoryPath varies depending on what wrapper we're dealing with. So, even if you can add an abstract static method to a class (I can on my test box), there is no way to call it through the parent because it belongs to the child.

Hopefully that makes sense and clarifies some of the design decisions that were made.

aaron’s picture

we need to also keep an eye on #391330: File Field for Core, as i suspect if it gets in first it will impact this patch (and vice versa). i'll review both these issues more this weekend with that in mind.

quicksketch’s picture

Following to keep track of this issue. Just at first glance, it looks like the getMimeType() method includes the same bug as noted in #520664: file_get_mimetype() should not be case-sensitive.

SeanBannister’s picture

Sub

jmstacey’s picture

Here's a new one.

Changes
- file_get_mimetype() case sensitivity fix brought forward from #519392: File API Stream Wrapper Support (Part 1).
- __put() fix brought forward from #517814: File API Stream Wrapper Conversion.
- normalizeUri() now removes the trailing slash like the documentation indicates.
- New unit tests for getStreamTarget() and normalizeUri().

jmstacey’s picture

I took a quick look at #391330: File Field for Core and from what I can tell it will only really affect #517814: File API Stream Wrapper Conversion. I've added a note to that issue.

This patch no longer contains any changes or removals. Everything is brand new. It's really quite boring in comparison to the merges on the other two branches :-)

aaron’s picture

Note: #531476: Add favicon mimetype theme setting removes the included file.inc in the maintenance theme, which was complicating matters somewhat.

pwolanin’s picture

Converted static class to a Drupal hook

dopry’s picture

You might want to remove file_put_contents('/tmp/log.txt', print_r($wrappers,1)); from line 67 of the patch.

The parse_uri_scheme function is incorrect. The URI format according to the rfc is : the // is specifically the common internet scheme indicator. I understand that PHP internally makes the :// assumption as well, but it breaks relative file paths file:path/to/file.ext and other scheme formats such as mailto:user@examples.com.
Just because jani@php.net is too good for standards doesn't mean drupal shouldn't do it's best to adhere to them properly, see http://bugs.php.net/bug.php?id=47070. At the very least some pressure should be applied to the PHP folk to get them to recognize and correct their improper support of the URL format. I guess it could live until php gets in line, but the reasoning for not properly parsing scheme URI's should be noted.

pwolanin’s picture

@dopry - sorry about the leftover debug code.

Also, http://us.php.net/manual/en/function.fopen.php says that PHP looks for a filename of the form "scheme://...", so it would seem that we are locked into this to some extant in terms of responding to what PHP expects.

eojthebrave’s picture

Took a quick peak at this, didn't make it all the way through ran out of ice-cream. Here are the notes I kept.

Order is wrong here.

+ * Returns the entire Drupal stream wrapper registry.
+ * @return array

Lines should be reversed and the "Returns the ..." line should be indented. And we don't usually specify variable types like that.

Probably don't need to have the extra newline in file_uri_scheme

"usefule" should be "useful" in comments for file_stream_wrapper_valid_scheme, there is also an extra line break between the summary and first @param

+ * A stream, referenced as scheme://target
For file_stream_wrapper_get_instance_by_uri needs a "." at the end. I think the return documentation for this is wrong to. It looks to me like you are confusing private and public. With a URI of public:// I would expect to get an instance of the Public stream wrapper class.

"container" should be "contain" in documentation for file_stream_wrapper_get_instance_by_scheme

There is an extra space here.
+ * This allows you to set the URI. Generally is only called by the factory

The $mode param for DrupalStreamWrapperInterface::chmod() needs documentation.

DrupalLocalStreamWrapper::stream_open() documents @param &$opened_path, but uses a variable named $opened_url

I think you might be missing an & here.
+ if ((bool)$this->handle && $options & STREAM_USE_PATH) {
That or I just don't know what using a single & operator does.

The $operation param for DrupalLocalStreamWrapper::flock() is un-documented.

drewish’s picture

i've fixed the issues identified by eojthebrave in the git repository. we'll re-roll later today.

drewish’s picture

FileSize
33.39 KB

okay here's the above changes with some additional docs for the hooks we've introduced.

Status: Needs review » Needs work

The last submitted patch failed testing.

jmstacey’s picture

FileSize
33.73 KB

Made a few tiny doc and whitespace fixes.

That or I just don't know what using a single & operator does.

Bitwise operator.

jmstacey’s picture

Status: Needs work » Needs review

Setting to needs review.

dopry’s picture

The bug is back open in PHP land... regarding :// the point is that it doesn't match the URL standard and should be documented as there is a good chance PHP will support well formed URL's properly in the future.

Also how did this patch end up in the 'file' name space? Was there strong resistence to 'stream'? if node:// and mailto:, etc are possibilities... why bury it in the file namespace?

pwolanin’s picture

@dopry - the functions are in file.inc and our main use in core will be for files, so I think it's a reasonable name space to use.

aaron’s picture

Status: Needs review » Needs work

@dopry - I agree that we should document that :// business so it can be addressed at a future point.

aaron’s picture

Status: Needs work » Needs review

oops, sorry, didn't mean to reset the status...

pwolanin’s picture

New patch with :// issue documents and removing uneeded constants. Also makes sure stream wrappers get registered during bootstrap.

Status: Needs review » Needs work

The last submitted patch failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
34.35 KB

oops - the Class needs to be in file_test.module

pwolanin’s picture

Minor doxygen fixes, removed t() from getInfo(), no real code changes.

pwolanin’s picture

wrong patch. This one has the t() fix

aaron’s picture

Status: Needs review » Reviewed & tested by the community

this looks good to me. let's get this one in!

Dries’s picture

Status: Reviewed & tested by the community » Needs work

I reviewed this in private with the sprint team, through pwolanin. I should probably have shared my review publicly. The next re-roll should be commit-able.

pwolanin’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
33.93 KB

final version with more comment cleanup and CHANGELOG

pwolanin’s picture

CHANGELOG typo

pwolanin’s picture

Doxygen typo also noticed by moshe.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD! Thanks all ... onto the next issue.

dopry’s picture

/me waves excitedly from the peanut gallery.

aaron’s picture

In case anyone's looking for the next patch: #517814: File API Stream Wrapper Conversion

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

hass’s picture

How bad that the _file_remove_dot_segments (#9, #11) function have not found the way into core! I would be really happy if this can get into core and I can rely in modules on this code. I have an open issue in #832388: Invalid URI's not normalized / dot-segments not removed that need uri normalization functions, but all the functions I've found in core have missleading names and do not really do uri normalization except "schema" normalization.

It would be good to clean up core or implement all uri normalization functions required by RFC 3986. Code in core should be reusable...

kukle’s picture

Subscribing