Hi all,

So I am posting here since the guys at the QueryPath module have been completely ignoring my questions for over a week..... I tried Xpath first, but was unable to get it working 100% - was hoping maybe someone here had a way to do that since it looks like querypath can't.

So I can get my imports to work 99% of the way with Xpath - however, if I just put "." in the field, I get all the content INCLUDING the XML tag. However, if I put string(.) it gives me the correct code, but it parses all the HTML so all I have is plain text.

Is there a way I can get the full, entire contents of the element I am trying to import, without getting the primary tag?

So for example a way to import:
<text id="helloworld"><a id="textid" href="http://www.google.com>Hello World</a></text>

so it would return:

<a id="textid" href="http://www.google.com>Hello World</a>

Comments

nmillin’s picture

Have you tried
//a[@id='textid']
This wouldn't import the text tags, but just the a tag.

http://www.w3schools.com/xpath/xpath_syntax.asp has the syntax that got me started. I also use the Feeds Tamper module to clean up the HTML that I'm importing (RegEx, Find/Replace and a bunch more options).

-Nate