Problem/Motivation

The FileCopy plugin currently (barely) supports remote file sources, using copy(). It's a dirty hack; we should replace it with a plugin dedicated to remote file processing, that uses the Drupal API (read: Guzzle).

Proposed resolution

  1. Create a Download process plugin to handle remote files through the Drupal API.
  2. Remove remote file handling capabilities from the FileCopy process plugin, so it can use file_unmanaged_copy() instead of copy().

The plugin should support a similar configuration structure to FileCopy. Something like:

uri:
  plugin: download
  source: source_relative_path
  guzzle_options:
    base_uri: http://example.com/
# anything else that goes into Guzzlehttp\Client::__construct()

Remaining tasks

Write patch

User interface changes

n/a

API changes

Yes, if you were using FileCopy with a remote file source, you need to now use the RemoteFile process plugin in your migration yml.

Data model changes

n/a

CommentFileSizeAuthor
#57 followup-2783075-57.patch696 bytesclaudiu.cristea
#50 2783075-50.patch20.79 KBclaudiu.cristea
#43 interdiff.txt1.18 KBclaudiu.cristea
#43 2783075-43.patch20.87 KBclaudiu.cristea
#36 interdiff-2783075-31-36.txt1.12 KBphenaproxima
#36 2783075-36.patch20.79 KBphenaproxima
#31 interdiff-2783075-30-31.txt4.91 KBphenaproxima
#31 2783075-31.patch19.74 KBphenaproxima
#30 2783075-30.patch15.36 KBphenaproxima
#25 interdiff-2783075-23-25.txt11.4 KBphenaproxima
#25 2783075-25.patch20.51 KBphenaproxima
#23 interdiff-2783075-21-23.txt5.95 KBphenaproxima
#23 2783075-23.patch8.66 KBphenaproxima
#21 interdiff-2783075-19-21.txt4.86 KBphenaproxima
#21 2783075-21.patch7.77 KBphenaproxima
#19 interdiff-2783075-10-19.txt8.15 KBphenaproxima
#19 2783075-19.patch8.95 KBphenaproxima
#10 download-plugin-2783075-10.patch9.16 KBohthehugemanatee
#4 download-plugin-2783075.patch10.42 KBohthehugemanatee
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ohthehugemanatee created an issue. See original summary.

chx’s picture

Issue summary: View changes
chx’s picture

Title: Add a "remote file" process plugin, remove remote capability from FileCopy » Add a "download" process plugin, remove remote capability from FileCopy
ohthehugemanatee’s picture

Status: Active » Needs review
FileSize
10.42 KB

Here's a first stab at a Download plugin and tests. My local environment is borked for filesystem-related tests, so let's see how the testbot does with it.

This patch can't be completed until #2695297: Refactor EntityFile and use process plugins instead gets in, because the FileCopy plugin won't exist in core until then... so there's nowhere to remove copy() from.

Status: Needs review » Needs work

The last submitted patch, 4: download-plugin-2783075.patch, failed testing.

chx’s picture

I will admit I am not particularly well versed in Guzzle API. I thought you needed to pass the options to the Client constructor but I see request also has options. Could anyone find documentation what each of these are ?

chx’s picture

ohthehugemanatee’s picture

You can pass the options to the constructor and get a response in a single call, or build the request in steps. Agreed that it's better to stream directly to the fs, though.

Also, what's the right way to get the mocked http_client to the plugin? Do I have to mock the whole service container?

chx’s picture

No, you instantiate the class direct with a mock ClientInterface -- you need to fix the typehint while at it.

$client = $this->prophesize(ClientInterface::class);
$client->request->willReturn...
new WhateverPluginClass(.... $client->reveal());
ohthehugemanatee’s picture

Hm. This version streams the file directly to the filesystem... but there's less to test. Since the actual file write is handled by Guzzle, it's outside of the SUT. So I really only have to test that the file is created with the right name.

I feel like this test is more or less immaterial. The only (little bit) of complexity in the plugin is deciding if the directory is writable, and that's the part we don't test at all. We're basically just testing a wrapper around file_destination() here. Thoughts?

ohthehugemanatee’s picture

Status: Needs work » Needs review
chx’s picture

If you manage to get away with not writing a test please file an issue to remove every source plugin unit test. We need your magic for those too.

ohthehugemanatee’s picture

The only way I can see to add more test coverage here is to duplicate tests on Guzzle, file_destination, or file_prepare_directory . Simple plugin with simple tests. Honestly the tests that we DO have are already more duplicative of file_destination than I would like.

This is ready for human review/testing.

phenaproxima’s picture

Assigned: Unassigned » phenaproxima

I like this approach and would like to see it in Migrate, and I'd like to see a new process plugin be created to delegate the dirty work of file transfer to either the existing file_copy plugin or this new download plugin, depending on whether the input value is local or remote. Assigning to myself for review.

phenaproxima’s picture

Status: Needs review » Needs work
  1. +++ b/core/modules/migrate/src/Plugin/migrate/process/Download.php
    @@ -0,0 +1,159 @@
    + * Copy a file from one place into another.
    

    This makes it sound as though the plugin supports local files, which it does not. Can it be rephrased?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/process/Download.php
    @@ -0,0 +1,159 @@
    +   * Constructs a file_copy process plugin.
    

    Wrong plugin ID.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/process/Download.php
    @@ -0,0 +1,159 @@
    +   * @param \GuzzleHttp\Client $http_client
    

    Missing description. Also, if possible it would preferable to type hint to an interface here, if one exists, rather than a concrete class.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/process/Download.php
    @@ -0,0 +1,159 @@
    +    // Make the request. Guzzle throws an exception for anything other than 200.
    +    try {
    +      $this->httpClient->request('GET', $source, $this->configuration['guzzle_options']);
    +    }
    

    Before we do this, can we call parse_url($source) to validate that it's a real URL, and throw an \InvalidArgumentException or MigrateException if it's not? Just for sanity checking purposes.

  5. +++ b/core/modules/migrate/src/Plugin/migrate/process/Download.php
    @@ -0,0 +1,159 @@
    +      throw new MigrateException("Could not download remote file $source. Request failed with $response->getStatusCode(): $response->getReasonPhrase()");
    

    Not a big deal, but I would rather use sprintf() here instead of embedding method calls directly in the string.

  6. +  protected function getOverwriteMode() {
    +    if (!empty($this->configuration['rename'])) {
    +      return FILE_EXISTS_RENAME;
    +    }
    +    return FILE_EXISTS_REPLACE;
    +  }
    

    This is lifted right out of file_copy. I'm not sure if this would be too far out of scope, but IMHO this should be inherited from a new base class that both FileCopy and this class can extend.

  7. +  protected function getDirectory($uri) {
    +    $dir = $this->fileSystem->dirname($uri);
    +    if (substr($dir, -3) == '://') {
    +      return $this->fileSystem->realpath($dir);
    +    }
    +    return $dir;
    +  }
    

    Same here.

  8. +++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
    @@ -0,0 +1,97 @@
    +class DownloadTest extends FileTestBase {
    

    I'd like to see tests added that verify the correct exceptions are thrown if the destination directory is not writeable.

  9. +++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
    @@ -0,0 +1,97 @@
    +  /**
    +   * @var FileSystemInterface
    +   */
    +  protected $fileSystem;
    

    Missing description.

  10. +++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
    @@ -0,0 +1,97 @@
    +  public function testOverwritingDownload() {
    +
    +    // Source won't matter, the mock will always return the same anyway.
    

    Missing doc comment, and there's an extra empty line.

  11. +++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
    @@ -0,0 +1,97 @@
    +   * @throws \Drupal\migrate\MigrateException
    

    This method does not throw MigrateException.

  12. +++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
    @@ -0,0 +1,97 @@
    +    $http_client->expects($this->any())->method('request')
    +      ->willReturn(new Response(200, array(), $body));
    

    It would be useful to test that the Download plugin handles HTTP failures as expected.

chx’s picture

@phenaproxima are you still working on this?

phenaproxima’s picture

@chx: Yes, I still want this to get into Migrate. I don't want to actually code the patch, though, so that I can maintain RTBC privileges...unless you're willing to OK it once it's ready?

chx’s picture

Sure I will review. I was only asking because you assigned it to yourself. Please, finish it, I will review.

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
8.95 KB
8.15 KB

Okay, this oughta pass the tests! All of my comments are either addressed, or otherwise obviated.

chx’s picture

Status: Needs review » Needs work
+    $configuration += array(
+      'rename' => FALSE,
+      'guzzle_options' => array(),
+    );

I wouldn't use the long syntax these days for a brand new file. [] is your friend.

+      $destination_stream = fopen($final_destination, 'w');
+      if (!$destination_stream) {

Oh sweet summer child, you didn't think PHP has a gentle error passing way just like that? http://php.net/manual/en/function.fopen.php

If the open fails, an error of level E_WARNING is generated. You may use @ to suppress this warning.

I would use a ?: for getOverwriteMode and perhaps remove the method entirely. Is there a reason

$replace = !empty($this->configuration['rename']) ? FILE_EXISTS_RENAME : FILE_EXISTS_REPLACE;

is not working?

Why do we need to realpath the directory...? I thought mkdir supports streams https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWra...

+    $this->httpClient->request('GET', $source, $this->configuration['guzzle_options']);

Yay for guzzle_options. However, $this->httpClient->get is ought to work. https://github.com/guzzle/guzzle/blob/master/docs/quickstart.rst

Magic methods on the client make it easy to send synchronous requests:

$response = $client->get('http://httpbin.org/get');

This will break your test.

+    // $source must be a valid URL.
+    if (parse_url($source) === FALSE) {
+      throw new \InvalidArgumentException('Source URL is not valid');

http://php.net/parse_url

This function is not meant to validate the given URL

Why on earth are we validating the URL anyways? The real litmus test will be done by Guzzle: if it downloads, it's valid if it doesn't who cares whether because it's 404 or because someone tried to download a nonURL?

phenaproxima’s picture

Status: Needs work » Needs review
FileSize
7.77 KB
4.86 KB

@chx: this patch addresses all of your good points.

Regarding the error triggered by fopen() -- I discovered this while trying to write a test of a non-writable destination URI, which with core's PHPUnit configuration causes the test to fail, and rightfully so. That's why there's no explicit test of fopen() failing.

I'm marking this patch for review so that the tests will run, but even if it looks good, we're not quite done here -- I still need to remove remote support from FileCopy.

chx’s picture

> That's why there's no explicit test of fopen() failing.

No, the problem is the lack of @

phenaproxima’s picture

I didn't understand why we wanted to suppress fopen errors, but looking at it again I think I do. We want to stop PHP's error handling so we can try to prepare the directory first before we fail with an exception. Seems legit. Fixed and added a test of that.

chx’s picture

Wonderful, wonderful. Now on to removing the hack from the FileCopy plugin before we release 8.2.0. with that in it.

phenaproxima’s picture

This oughta do 'er. Also did a bit of cleanup/renaming -- CopyFileTest should have always been FileCopyTest.

mikeryan’s picture

Status: Needs review » Needs work

This regresses the migrations currently using file_copy - they will only work with local files. Thus, you would no longer be able to enter http://www.example.com/ for the source_base_path in migrate_drupal_ui. There needs to be some means to handle both - either migrate_drupal_ui swapping file_copy for download when it sees an HTTP url, or a wrapper process plugin that will select one or the other as needed.

chx’s picture

Status: Needs work » Needs review

This is why the ur-issue introducing this hack should never have been committed in the first place: it was cleanly a hack and that hack has been repeatedly used to justify this capability and now it's called a regression now that we clean up. It was bad enough to not block FileCopy on this issue; we won't block unhacking this finally on UI problems. There's always the next commit.

phenaproxima’s picture

Status: Needs review » Needs work

I concur with @mikeryan. We should maintain backwards compatibility and accommodate naive migrations that try to feed a remote URL into FileCopy. It shouldn't be too difficult to analyze the input to FileCopy and have it invoke the download plugin if it the input does not represent a local resource.

chx’s picture

Status: Needs work » Needs review

Followup.

phenaproxima’s picture

Version: 8.2.x-dev » 8.3.x-dev
FileSize
15.36 KB

This isn't RC eligible, so it's never going to get into 8.2.x. Rerolled against 8.3.x.

phenaproxima’s picture

Status: Needs review » Needs work

The last submitted patch, 31: 2783075-31.patch, failed testing.

The last submitted patch, 31: 2783075-31.patch, failed testing.

phenaproxima’s picture

Issue tags: +Dublin2016
phenaproxima’s picture

Issue tags: +Migrate BC break

Discussed briefly with @alexpott, and we think it's OK to commit this as a BC breaker, with a change record.

chx’s picture

http://docs.guzzlephp.org/en/latest/request-options.html#sink doesn't document whether Guzzle closes the sink or not. I am really wary of this because if it doesn't then downloading lots of files will end in tears. Please investigate -- while I am unaware of a simple boolean check whether a stream resource is closed or not, you can check its type:

$fp=fopen('index.php', 'r');print get_resource_type($fp) ."\n";fclose($fp);print get_resource_type($fp) ."\n";
stream
Unknown

Edit: to be clear I do not mean to change the patch to include a get_resource_type call but to temporarily add one and if it's not closed then add an fclose to the codebase. Or just add the fclose() and run a test and see whether you get a warning of trying to close an already closed stream.

heddn’s picture

Issue tags: -Migrate BC break

I don't think this actually *breaks* BC. It just gives us some additional options and can be added in a non-BC breaking manner. Removing tag. It can be re-added if you guys disagree.

phenaproxima’s picture

@chx: I tested this manually after adding an fclose() call after Guzzle does its thing. I can confirm that fclose() produces a warning: "supplied resource is not a valid stream resource". I take that to mean it has closed the sink after downloading.

claudiu.cristea’s picture

Version: 8.3.x-dev » 8.2.x-dev
Status: Needs review » Reviewed & tested by the community
Issue tags: -Dublin2016

This is a very nice feature and patch. I tested it on a real project and works very well. I see that @chx's concern from #37 has a reply in #39.

I don't understand why this cannot go in 8.2.x as we are not yet in beta stability.

claudiu.cristea’s picture

Assigned: phenaproxima » Unassigned

.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs change record
+++ b/core/modules/migrate/src/Plugin/migrate/process/FileCopy.php
@@ -84,8 +96,14 @@ public function transform($value, MigrateExecutableInterface $migrate_executable
+      return $this->downloadPlugin->transform($value, $migrate_executable, $row, $destination_property);

Should we return here or should we still do the file_exists() check - because after this it should exist - right?

Are there any BC implications of making this change? According to the IS there are. We need a CR to tell people exactly what to do.

claudiu.cristea’s picture

Status: Needs work » Needs review
Issue tags: -Needs change record
FileSize
20.87 KB
1.18 KB

Yes, I think it make sense to test the existence first.

Are there any BC implications of making this change? According to the IS there are. We need a CR to tell people exactly what to do.

No, the "BC break" was removed in #31.

phenaproxima’s picture

Wait, what? Won't putting the file_exists() check before the isLocalUri() call break BC like a twig? If the incoming value is a remote URL, file_exists() will return FALSE and will throw an exception before the download plugin can be invoked. I don't see how #43 maintains BC...?

claudiu.cristea’s picture

@phenaproxima, quoting from http://php.net/manual/function.file-exists.php

As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality.

phenaproxima’s picture

"Some URL wrappers" is not reassuring. Given that custom stream wrappers are a thing, FileCopy has no way of knowing if the return value from file_exists() will be accurate.

It is definitely not a BC break if we do the file_exists() check after the isLocalUri() check; on the other hand, if file_exists() come first, the level of BC breakage depends on the whims of the stream wrapper(s) involved, and how the user's PHP interpreter is configured. I can see it causing many hard-to-trace, use case-specific bugs, and as far as I can tell there is no need for us to go down that path.

Can we restore it back to the way it was? What do we gain by moving the file_exists() check? If leaving it after isLocalUri() means we have to write a change record, I'm happy to do that and keep it robust and clear, rather than introduce potentially unpredictable WTFs.

mikeryan’s picture

If http were one of the stream wrappers file_exists() works with, we wouldn't have needed any of this - the file_copy plugin would just use file_unmanaged_copy() and it would just work.

This does make me think, however, that there is a bit of a BC break here - the download plugin is HTTP-specific, so if someone previously was using, say, ftp://, that won't work any more. We haven't documented or shown any examples of anything other than http and local files, however, so that might not be a concern.

phenaproxima’s picture

@mikeryan: That seems like a somewhat different issue to me. It means we'd have to turn the download plugin into a delegator (one for Guzzle, one for FTP, one for SCP, one for carrier pigeon, etc.), but that's outside the scope of this issue.

Status: Needs review » Needs work

The last submitted patch, 43: 2783075-43.patch, failed testing.

claudiu.cristea’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
20.79 KB

Hm, file_exists() doesn't work because HTTP & HTTPS does not support stat() (see http://php.net/manual/en/wrappers.http.php).

Reuploading the patch from #36 and setting back to RTBC as @alexpott's question was answered in #43. Quoting:

No, the "BC break" was removed in #31.

Sorry for creating confusion.

alexpott’s picture

+++ b/core/modules/migrate/src/Plugin/migrate/process/FileCopy.php
@@ -84,8 +96,14 @@ public function transform($value, MigrateExecutableInterface $migrate_executable
+      return $this->downloadPlugin->transform($value, $migrate_executable, $row, $destination_property);

All I meant is can we remove the return? Because after calling this the file will exist right?

phenaproxima’s picture

I don't think we should remove the return, because the download plugin might not ultimately decide to save the downloaded data to the same place that FileCopy would have. At the moment it does, but FileCopy should not be making assumptions about Download's inner workings. Once it delegates the transformation to the download plugin, that should be where it ends for FileCopy.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed 17c15fb to 8.3.x and 1598399 to 8.2.x. Thanks!

As migrate is still experimental committing to both branches.

  • alexpott committed 17c15fb on 8.3.x
    Issue #2783075 by phenaproxima, claudiu.cristea, ohthehugemanatee, chx,...

  • alexpott committed 1598399 on 8.2.x
    Issue #2783075 by phenaproxima, claudiu.cristea, ohthehugemanatee, chx,...
neclimdul’s picture

Status: Fixed » Needs work
+++ b/core/modules/migrate/tests/src/Unit/process/DownloadTest.php
@@ -0,0 +1,128 @@
+namespace Drupal\Tests\migrate\Unit\process;
...
+use Drupal\KernelTests\Core\File\FileTestBase;
...
+class DownloadTest extends FileTestBase {

FileTestBase is not a unit test so this is not a unit test. phpunit --testsuite=unit is broken as it requires a database after this commit.

claudiu.cristea’s picture

Status: Needs work » Needs review
FileSize
696 bytes

@neclimdul, that's right

claudiu.cristea’s picture

Title: Add a "download" process plugin, remove remote capability from FileCopy » Followup: Add a "download" process plugin, remove remote capability from FileCopy
phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Preemptively re-RTBC on the assumption that Drupal CI will pass it.

neclimdul’s picture

+1. Manually tested and all is well. Thanks for the quick follow up.

alexpott’s picture

Title: Followup: Add a "download" process plugin, remove remote capability from FileCopy » Add a "download" process plugin, remove remote capability from FileCopy
Status: Reviewed & tested by the community » Fixed

Committed and pushed 9e822c0 to 8.3.x and b22f95a to 8.2.x. Thanks!

Opening a new issue would have been better tbh - this just pollutes the history because now there are two commits with the same title :(

  • alexpott committed 1d24d52 on 8.3.x
    Issue #2783075 by claudiu.cristea: Followup: Add a "download" process...

  • alexpott committed b22f95a on 8.2.x
    Issue #2783075 by claudiu.cristea: Followup: Add a "download" process...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.