In some cases, busyness import rules for the XML data are very hard to implement just with the XPath expression.
It is common to have lots of conditions about some import values, comming from more than one child or attribute, that are very hard to implement in the XPath 1.0. It should be a nice feature to create a hook that will get the currently parsed DOMNode and process it to determine wether it is required or not (additional filter ontop the XPath expression).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ndobromirov’s picture

Added the hook implementation.

ndobromirov’s picture

Added more context information to the filtering hook. The previous patch can be ignored.

ndobromirov’s picture

The last patch generates variable name conflicts within the parse method. Adding a new patch to fix the issue.

twistor’s picture

I'd be curious to know what your use case is, have you seen Feeds Tamper?

+++ b/FeedsXPathParserBase.incundefined
@@ -31,8 +31,8 @@ abstract class FeedsXPathParserBase extends FeedsParser {
-  public function parse(FeedsSource $source, FeedsFetcherResult $fetcher_result) {
-    $source_config = $source->getConfigFor($this);
+  public function parse(FeedsSource $feeds_source, FeedsFetcherResult $fetcher_result) {

Why is this being changed? I don't follow.

+++ b/FeedsXPathParserBase.incundefined
@@ -57,6 +57,11 @@ abstract class FeedsXPathParserBase extends FeedsParser {
+      // Invoke a hook to checke wether the node should be skipped.

checked

+++ b/FeedsXPathParserBase.incundefined
@@ -57,6 +57,11 @@ abstract class FeedsXPathParserBase extends FeedsParser {
+      if(in_array(true, module_invoke_all('feeds_xpathparser_filter_node', $feeds_source, $node,), true)) {

TRUE

twistor’s picture

Status: Active » Needs work
ndobromirov’s picture

Thanks for the review.

Answer of Q1:
I am using feeds tamper, but it is a tool for additional processing of mapped field(s). Correct me if something more can be done with it.

My use case is that I want to filter the context nodes before passing them for processing with feeds tamper.
Normally this is implemented with the "Context" XPath expression in the parser settings, but in my case the expression is becoming too large, complex with a lot of sub-expression repetitions within it - hard to maintain/change. In some of the use cases I need to be able to use node set manipulations like intersection and subtraction when XPath 1.0 allows only union with the "|" syntax.

This is the reason that a hook was implemented, with some PHP code all of the above was solved in a matter of 2-3 screens of code.

Answer of Q2:
The parameter name $source in the original implementation needs to be of type FeedsSource and it is passed as additional context information to the filtering hook. Part of the process method without rename of the variable would look like this:

...
foreach ($all_nodes as $node) {
      if(in_array(true, module_invoke_all('feeds_xpathparser_filter_node', $source, $node), true)) {
        continue;
      }

      $parsed_item = $variables = array();
      foreach ($source_config['sources'] as $source => $query) {
        ...
      }
      ...
    }
....

The inner loop defines iteration key variable with the same name as the parameter, replacing the FeedsSource with a string on the first iteration, making the call to the filtering hook with wrong source context in the second iteration of the outer loop.

Answer of Q3:
Thanks, for spotting the typo.

Answer of Q4:
Will fix the constants to the convention specs.

twistor’s picture

Version: 7.x-1.0-beta3 » 7.x-1.x-dev

1. That is true. Feeds Tamper can do filtering, but you lose the context of the XML. I am familiar with how unruly the XPath expressions can get. Could we call the hook 'feeds_xpathparser_filter_domnode' or something so that it's not confused with node.module? Maybe pass in the DOMDocument as well so you could do more than just filter, like append children or something.

2. Good catch. I'm surprised this hasn't caused trouble already. Not to be picky, but we should leave $source in the function signature, and change foreach ($source_config['sources'] as $source => $query) { to foreach ($source_config['sources'] as $element_key => $query) {. The $source in the function is pretty standard.

You might also be interested in #1846956: Allow using safe PHP functions.

ndobromirov’s picture

Thanks for the fast response.

1. I will rename the hook and add the document parameter.

2. I will do the refactoring and upload a patch with all the discussed fixes.

Best regards,
Nikolay D.

ndobromirov’s picture

Adding the patch fixing the issues discussed.

twistor’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
Assigned: ndobromirov » twistor
Status: Needs work » Patch (to be ported)
osopolar’s picture

TODO: The hook hook_feeds_xpathparser_filter_domnode() needs to be documented somewhere - for example in a README.txt.

jenlampton’s picture

Issue summary: View changes

Also could be useful for documentation...

If you want to actually *see* the DOMNode you need to do this...

  $temp_dom = new DOMDocument();
  $temp_dom->appendChild($temp_dom->importNode($node,true));
  $debug = $temp_dom->saveHTML();

  dpm($debug);
twistor’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Needs review
FileSize
1.24 KB

Would something like this work?

Words are hard.

osopolar’s picture

Looks nice, thanks @twistor.

  • twistor committed b895fea on 7.x-1.x
    Issue #1945732 by ndobromirov, twistor: Add docs.
    
twistor’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
Assigned: twistor » Unassigned
Status: Needs review » Patch (to be ported)
twistor’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Closed (fixed)

6.x usage is declining slowly. If anyone feels like backporting go ahead.