While working on #624088: Mapper: ImageField/FileField (formalize feed elements), it appears to me that, in order to provide a stable and re-usable mapper, a generalized enclosure mechanism is required.

At this time, the only parser supporting enclosures is FeedsSimplePieParser. It returns the enclosures found by SimplePie in the enclosures mapping source as SimplePie_Enclosurein in nested arrays (indexing by mime type and subtype) with the following code.

      $enclosures = $simplepie_item->get_enclosures();
      if (is_array($enclosures)) {
        foreach ($enclosures as $enclosure) {
          $mime = $enclosure->get_real_type();
          if ($mime != '') {
            list($type, $subtype) = split('/', $mime);
            $item['enclosures'][$type][$subtype][] = new $enclosure;
          }
        }
      }

In order to support enclosure, mappers have to use SimplePie_Enclosure's methods. This is what I've done currently done in #624088: Mapper: ImageField/FileField (formalize feed elements) (comment #2

This means that other parser willing to provide enclosures have to comply to its interface. ). If a parser provides enclosure using another data structure (like a string containing the enclosure URL, without more info), the mappers have to support these as well in order to be re-usable.

As generalized enclosure interface for Feeds would allow mappers (or processor) to be against a single interface. Parser would have the responsibility to provides their enclosure using this interface.

The attached patch provide the base for such an interface with an adapted FeedsSimplePieParser.

PS: I use enclosure as a generic term for file attached to feeds items.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Title: Provides a generalized enclosure interface » Provide a generalized enclosure interface

I do like where you are going :)

I've got one concern however:

For better or worse, I've so far avoided declaring the result arrays. I've always thought declaring them would be a good idea, I just haven't gotten the time to fully think it through yet.

Defining an interface for enclosures and a wrapper for SimplePie enclosures would start doing that and actually it would break the current convention. And this is my concern. If we commit the patch above, we would have a convention that is partially undeclared ($parserResult->value array) and partially declared (enclosures).

I'm thinking we should declare all or nothing.

Here is my suggestion:

1. Simplify the enclosure array in a first patch:


$result->value[items[][enclosures][] = array(
  array(
    'url' => 'http://...',
    'description' => '...',
    'mime' => '...',
  ),
);

This does away with the weird cascading array by type and subtype.

2. Roll an initial patch for declaring most of the result arrays.

This will be most likely a longer process as it will touch more parts in feeds. I should do a little cleanup on the FeedsResult classes before anything.

Thoughts?

(PS: this patch shows tabs instead of double-spaced indents).

pbuyle’s picture

Status: Needs review » Active

I've one concern with arrays for enclosure. They are inflexible and restrain future evolution/refactoring using OO approach. It is not possible to adapt an existing structure as an array like I do with FeedsSimplePieEnclosure. It also means that even if a part of the structure is rarely used, it has to be produced before returning the array*. For instance, suppose that in a near future it is decided that it's the parser/enclosure responsibility to provide the content of an enclosure both as local file and as a string. So consumers won't have to worry about getting the actual file content themself. With arrays, the options for this are very limited. With object, implementation can do whatever fits best (lazy downloading, deferred file write, etc.).

I don't really understand the current undeclared structure convention for $parserResult. It make processors job harder because there is no enforceable contract between them and the parsers (same for $fetcherResult). I fully understand it for elements of $parserResult->items because their structure is really unknown before runtime. It relies on the parser sources and can be fully dynamics (as in the CSV parser).

IMHO
- for runtime dynamic structures, arrays as undeclared structures are a fine convention**
- for declared structures, arrays are inflexible. They reduce code maintenability and restrain evolution
- structures used to exchange data between component musts be declared unless they are runtime dynamic.

So I would prefer
- A first patch with an enclosure interface to be used in #624088: Mapper: ImageField/FileField (formalize feed elements) and #623432: Mapper for emfield.
- A second (initial) patch with declared structures (through interfaces or abstract classes) for FeedsFetcherResult, FeedsParserResult, etc. This patch would have to contains basic implementations working with the current arrays structures to ease the port of existing fetchers, parsers and processors.

These are only my preferences. If you decide to go with arrays, I would deal with it.

*Or provided though a callback mechanism in a non-OO fashion.
**In PHP 5, the magic methods __get, __set, __isset and __unset could be used for undeclared structures. But arrays seems easier.

pbuyle’s picture

Here is a fixed patch without tabs indents. It also defers handling of the SimplePie's enclosures array into FeedsSimplePieParser::getSourceElement.

pbuyle’s picture

Status: Active » Needs review
FileSize
22.67 KB

Here is a patch with interfaces for both enclosure and results. It passes all the 12 test case currently in HEAD.

It needs review and certainly more works.

PS: The code is also at https://code.launchpad.net/~mongolito404/drupal-feeds/interfaces

pbuyle’s picture

I crafted a simple interface for enclosures and fetcher results. It is probably better to use an interface compatible with the D7 File API (which I don't know yet). This should help with for the D7 version of Feeds or a switch to a D6 backport of its File API. Since PHP does duck-typing, only methods with Type Hinting will need to be modified.

pbuyle’s picture

After looking at the D7 file API, I don't see an interface to use for enclosure. The stream wrapper class is too low-level for what Feeds needs. It may be used to implement the FeedsFileInterface I defined in my patch.

There is work for backports of D7 File API to D6[1][2] so Feeds may start to use the new API as soon as a backport is ready.

[1] http://drupal.org/project/hook_file
[2] http://github.com/EugenMayer/FileApiBackport

alex_b’s picture

Quick reroll before more testing. Patch to FeedsNodeProcessor failed due to recent commit.

alex_b’s picture

I really like where this is going. Great work.

I did some simplifications. Let's use interfaces only if we actually need to handle objects of different classes as the same thing (e. g. with FeedsFileInterface for FeedsFetcherResult and FeedsEnclosureInterface) or if we need to use a class in different contexts (e. g. a FeedsPlugin is a FeedsConfigurable and a FeedsSourceInterface, as such it can be configured and stored and it can interact with a FeedsSource).

Interfaces otherwise tend to confuse as they suggest they do one of the above.

In this spirit, I:

1. Removed FeedsResultInterface (empty, we can bring it back if we need to define methods on this level)
2. Merged FeedsParserResultInterface, FeedsSyndicationParserResultInterface, FeedsSyndicationParserResult into FeedsParserResult: I think there is a large value in keeping these results absolutely normalized. It simplifies the code consuming these results.
3. Created FeedsFetcherResult that unifies some common functionality between FeedsFileFetcherResult and FeedsHTTPFetcherResult

Further, I've

4. Renamed FeedsResultEnclosure to FeedsEnclosureInterface
5. Moved FeedsSimplePieEnclosure to FeedsParser

Finally:

6. Have you thought about how FeedsSyndicationParser will handle enclosures? Ideally we can create a common FeedsEnclosure class in FeedsParser.inc that is used as-is by FeedsSyndicationParser directly and by FeedsSimplePieParser by extension with FeedsSimplePieEnclosure.

pbuyle’s picture

FeedsSimplePieEnclosure's getContent and getFile could be extracted in a common abstract class to be sub-classed by any parser producing enclosures from URLs. A default implementation can also be provided for getMimeType, using Drupal's file_get_mimetype which also works on URLs. For completion, a base class providing the description and url getters can also be added.

abstract class FeedsAbstractEnclosure implements FeedsEnclosureInterface {

  public function getMimeType() {
    return file_get_mimetype($this->getUrl())
  }

  public function getContent() {
  feeds_include_library('http_request.inc', 'http_request');
    $result = http_request_get($url);
    if ($result->code != 200) {
      throw new Exception(t('Download of @url failed with code !code.', array('@url' => $url, '!code' => $result->code)));
    }
    return $result->data;
  }

  public function getFile() {
    if(!isset($this->file)) {
      //@todo get extension from url or mime_type
      $dest = file_destination(file_directory_temp() . '/' . get_class($this). '.tmp', FILE_EXISTS_RENAME);
      $file = file_save_data($this->getContent(), $dest);
      if($file === 0) {
        throw new Exception(t('Cannot write content to %dest', array('%dest' => $dest)));
      }
    }
    return $this->file;
  }
}

class FeedsBaseEnclosure extends FeedsAbstractEnclosure {
  private $description;
  private $url;

  function __construct($url, $description = '') {
    $this->description = description;
    $this->url = $url;
  }

  function getUrl() {
    return $url;
  }

  function getDescription() {
    return $description;
  }
}
alex_b’s picture

What about?


class FeedsEnclosure implements FeedsFileInterface {
  
  protected $description;
  protected $url;

  function __construct($url, $description = '') {
    $this->description = description;
    $this->url = $url;
  }

  public function getMimeType() {
    return file_get_mimetype($this->getUrl());
  }

  public function getContent() {
    feeds_include_library('http_request.inc', 'http_request');
    $result = http_request_get($this->url);
    if ($result->code != 200) {
      throw new Exception(t('Download of @url failed with code !code.', array('@url' => $url, '!code' => $result->code)));
    }
    return $result->data;
  }

  public function getFile() {
    if(!isset($this->file)) {
      //@todo get extension from url or mime_type
      $dest = file_destination(file_directory_temp() . '/' . get_class($this). '.tmp', FILE_EXISTS_RENAME);
      $file = file_save_data($this->getContent(), $dest);
      if($file === 0) {
        throw new Exception(t('Cannot write content to %dest', array('%dest' => $dest)));
      }
    }
    return $this->file;
  }

  function getUrl() {
    return $this->url;
  }

  function getDescription() {
    return $this->description;
  }
}
alex_b’s picture

Title: Provide a generalized enclosure interface » Consolidate import stage results and provide generalized enclosure class
FileSize
21.22 KB

I haven't had the time to work much on this problem during the last week, but I've been thinking about this problem all along. Finally I had some undisturbed time and could write code.

This patch introduces a new FeedsFeed class that unifies FeedsFetcherResult / FeedsParserResult. In its lifetime, a FeedsFeed object is being fetched, parsed and processed before it is released. It is relatively light and its easy to imagine how a FeedsEnclosure class could share a common super class with FeedsFeed. Another option would be to use a FeedsFeed object itself for representing an enclosure (need to think more about this).

Further, a FeedsFeed has inherently a status. And this is what I was looking for in regards to #600584: Use Batch API. If parsing or a processing stage has not finished, a FeedsFeed can be cached with a source and the next time a source is being invoked for importing, it can pick up with the cached FeedsFeed object instead of fetching/parsing a new one.

alex_b’s picture

FileSize
24.89 KB

Sorry, forgot to add new file. All tests passing.

alex_b’s picture

Status: Needs review » Needs work
FileSize
18.93 KB

[edit: posted on the wrong issue, please ignore post]

alex_b’s picture

I am starting to think that we don't need heavy enclosure handling. I'd much rather go with simple (arrays of) URLs until we have a real need for heavier enclosure handling #624088-19: Mapper: ImageField/FileField (formalize feed elements)

alex_b’s picture

Status: Needs work » Needs review
FileSize
24.93 KB

- Renamed nextItem() to shiftItem() as the method consumes the items array.

This is pretty much ready to be committed from my point of view. mongolito404: I'd love to get in touch with you on IRC in regards to the enclosure handling.

alex_b’s picture

FileSize
25.29 KB

- Calling FeedsFeed FeedsImportBatch in anticipation of #600584: Use Batch API where we will add persistence to imports and where we'll need a corresponding FeedsBatch class for clear() operations.
- I will commit this to pave the path for Ian Ward working on a Mailhandler port to Feeds that will make heavy use of this patch.

alex_b’s picture

Status: Needs review » Fixed

Committed. Setting this issue to fixed as this patch consolidated the import stage results and the open generalization of enclosure handling is captured by this patch:

#624088-28: Mapper: ImageField/FileField (formalize feed elements)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.