Feed Items
These are content types which you wish to use for displaying the content from the feed. The feed items should contain all the fields you wish to represent on your site. In the following example, job is the feed item.
Feed Processor Node
This is a content type used by the Feeds Processor. In the following example, xml job feed is the feed processor node.

Step 1: Download 4 modules: Feeds, Job Scheduler, Features, Feeds XPath Parser (or Feeds Ex).
Step 2: Enable the following modules: Feeds, Feeds Admin UI, Job Scheduler [required by Feeds], Feeds News [requires Features], Features, Feeds XPath Parser.
Step 3: Set up two new content types: one for the Feed Items, one for the Feed Processor. For example if we were importing an XML list of jobs, we would create these content types:
- job
- xml job feed
Step 4: Add additional fields to the feed item content type. For example, to add a 'salary' field, I would add a number or text field to the job content type named field_salary.

You are now ready to configure your feed importer.

Step 5: Create a new importer. You can navigate there by going to Admin - Site Building - Feed Importers (D6) or Structure - Feeds Importers (D7). Give it a name and description. In this example 'Jobs XML Feeds'.
Step 6 : Configure your new importer:

Basic settings

  • Settings
  • Attached to: Select your content type for feeds processor, e.g. xml job feed. (Make sure you don't select the type of content you want the feed create)
  • For testing make sure check Import on submission so you can see results *

Fetcher

  • Settings

HTTP Fetcher

  • For testing purposes, it may be worth using File Upload.

Parser

  • Select: XPath XML parser

Processor

  • Node Processor Settings - This is where you select the Feed Processor Node you want feeds to create to hold the content it imports. In this example job.
  • Mapping - For each XML field you want mapped choose XPath Expression from the Source Drop down and select a destination in the Target drop down.

At this point, we are marking what fields will later accept an XPath mapping. These mappings will be configured later.

Now its time to create your feed.

Step 7: Create the content you want to process with the feed processor, in this case xml job feed. Give it any name. Now analyze the XML of your feed to determine it's structure.

<?xml version="1.0"?>
<jobs>
   <item>
      <title>XML Developer</title>
      <src url="http://www.here.com">Here.com</src>
      <salary ccy="usd">50,000</salary>
      <salary ccy="gbp">30,000</salary>
      <publish_date>Tue, 06 Oct 2010 15:21:48 +0000</publish_date>
      <description>A job creating applications with XML.</description>
   </item>
   <item>
      <title>Drupal Developer</title>
      <src url="http://www.there.com">There.com</src>
      <salary ccy="usd">60,000</salary>
      <salary ccy="gbp">40,000</salary>
      <publish_date>Tue, 07 Oct 2010 15:21:48 +0000</publish_date>
      <description>A job creating applications with Drupal.</description>
   </item>
</jobs>

Step 8: Create XPath Mapping
To map the fields your Drupal feed processor node would look something like:

context:  //item 
title: title
body: description
url: src/@url
field_salary: salary[@ccy="usd"]

There are two ways to create mappings:

  • Create the mapping at the time of importing data from the feed. In Drupal, the basic control structure is a Node. Feed Importation is triggered by creating a Feed Processor Node (xml job feed). Every time a Feed Processor Node is created, the mappings need to be re-entered. This has the advantage of allowing you to tweak the mappings until you're satisfied with the results.
  • Create the mapping while configuring the Feed Importer. Hardcode the mapping in the Importer's Parser settings. See F.A.Q. for more information.

There are two paths to trigger the creation of a Feed Processor Node:

  • Go to the Import page (a link is on the list of Importers)
  • Select the Feed Processor Node (xml job feed) you wish to use to trigger importation. This will take you to the Create xml job feed page.

  • or go directly to the Content creation page and create the Feed Processor Node (xml job feed)

On the Create xml job feed page, go to the FEED section and enter in the above XPath Mappings. Notice there are debug settings in the XPATH PARSER OPTIONS section. Load your XML file.

Step 9: Save the Feed Processor Node. This starts the importation process. (In Drupal 6, save and then click on the import tab and click 'Import')

Your feed should have created two new job nodes, populated with the correct data. You can view them by going to the Content Page and looking for the most recently added content.

More questions?

Check out the F.A.Q..

*I spent quite sometime thinking something was wrong with my XML since every time I uploaded a file it said nothing imported but no errors. Finally I checked this box and it all just worked.

Comments

mardok’s picture

Thanks for this tutorial
I followed this instructions step by step.
I created two content Types named "News" and "XML news"
On Feed Importer Building:
>> Basic settings: -- Name: "News" -- Attach to content type: "XML news" -- Import on submission: Unchecked
>> HTTP Fetcher
>> Settings for HTTP Fetcher: All unchecked
>> Parser: "XPath HTML parser"
>> Settings for XPath HTML parser: I saved
>> Processor: "Node processor"
>> Node processor Settings: -- Content Type:News -- Input format:Default format -- Expire nodes:Never -- Do not update existing nodes
>> Mapping for Node processor: -- xpathparser:0 > Title -- xpathparser:1 > Body -- xpathparser:2 > URL (unique target unchecked)
>> Import > XML news
>> Title > Test
>> I've used the same xml file of this tutorial
>> context: /jobs/item
>> title: title
>> body: description
>> url: src/@url
>> Select the queries you would like to return raw XML or HTML: all unchecked
>> experimental > Use tidy > unchecked
>> URL: http://www.newsmet.com/sites/default/files/xml-sites/prova.xml
>> I Saved the file
>> Message: "XML news Test has been created."
>> Import
>> Message: "There is no new content."

Why don't work?

kapayne’s picture

did you figure this out? I'm having similar problems

lancerkind’s picture

When I created my feeds node for my real-world use, I got a bunch of warnings about:

# warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseStartTag: misplaced <body> tag in Entity, line: 739 in /home2/lancerki/public_html/woaidianying/sites/all/modules/feeds_xpathparser/FeedsXPathParser.inc on line 397.

And for a finale, an error related to namespaces:

# There was an error with the XPath query: /x:html/x:body/x:div/x:div[2]/x:div[2]/x:div[1]/x:div[1]/x:ul/x:li[4]/x:table/x:tbody.
Libxml returned the message: Undefined namespace prefix, with the error code: 1219.
# Could not retrieve title from feed.

In the real world, XML uses namespaces all over the place. The tutorial doesn't use namespaces. Hmmm... I should be able to do the same thing by exanding the "x:" to "http://www.w3.org/1999/xhtml:" which makes a long and ugly xpath. How about a section for declaring namespaces? Is there a way to handle this other than manual namespace expansion?

Even though I haven't got it working yet, this tool is totaly going in the right direction and I'm very excited. Good work and thank you for sharing this! I'm a programmer and a writer. I'd love to write contribute documentation for this tool when I get it working.

Cheers,

xurshid29’s picture

http://drupal.org/node/919448#comment-3517006 - Problem with this tutorial

I am having the same problem. Did you solve it?

rwsimmo’s picture

I am also having trouble with this tutorial. Here are some questions:

1. If you are parsing XML, why are you using the XPath HTML parser?
2. Should a new content type really be created for the "xml job feed" or should the Feed content type be used?
3. Where does the following go? In the body of the "xml job feed"?
context: /jobs/item
title: title
body: description
url: src/@url
field_salary:salary/[@ccy="usd"]

lancerkind’s picture

I can answer 1, 2, and part of 3.

1: XML and HTML are cousins. HTML can be formated as "well formed XML" if the rules of XML are followed. (Most modern HTML is well formed. All developers and modern tools should be generating well formed XML.) So HTML is a sub-type of XML if the rules of XML are followed.

2: An intermediate content type needs to be created for the Feeds framework to work with the data. When the parser pulls in the data from the website, it needs something to store it in. It stories it in a "Feed" content type (Job Feeds in this example). And from this object, the Feeds framework's Processor can move the data to it's final type, the Job type in this example.

3: Context, URL, field_salary are in the body of, per your words, "xml job feed". After you create a "xml job feed" type, scroll down and expand "Feed" and you'll see the text boxes for these options. As for title and body, I too seek enlightenment. I don't understand how title and body is supposed to be mapped to an XPath query.

BTW, A great firefox tool to install is XPath Checker. I've never had the chance to work a lot with XPath and the XPath Checker is pretty awesome and lets me know that at least my XPath expressions are correct. Without it, I have too areas of uncertainty: Drupal Feeds/XPath parser AND my xpath expressions.

lancerkind’s picture

How are the none CCK fields mapped to an xpath? Title and Body particularly.

lancerkind’s picture

Using Drupal 7, and Feeds 7.x-2.0-alpha3, Feeds XPath Parser 7.X-1.0-beta1, I followed this example through all the steps. But something is wrong (or a bug) with step 3, adding new fields.
I'm unable to "see" newly added fields during the mapping of step 6.

In other words, I created a salary field for Xml Job Feed, associated that nod with a new Feeds Processor, but when I look at the mappings, add an XPAth mapping, I'm not seeing the new fields I've added for targets.

Anyone else having this issue with Drupal 7, and Feeds/Xpath?

lancerkind’s picture

The reason I wasn't seeing my new salary field (or the three other ones I tacked on to try and debug things) is that I hadn't correctly set the Node Processor, Content type. It was still set to the default of Article rather than my final node type, Jobs.

Remember, Article won't have any of my new fields, just Jobs. So if you find yourself in the situation that you can't map an XPath to a target because your field won't show up in the drop down, check the Node Processor (in the left nav box), click on it and check the dropdown labeled Content type.

lancerkind’s picture

More than once, I've been in the create Importer screens, defining my mapping, and I hadn't selected the correct Feeds Node Processor (defined in Processor settings).

When defining the Node Processor mappings, if that screen showed what content type was set for the Feeds Node Processor, that would let me identify mistakes more easily. Hell, make it a link to the content type so if I need to go back and add a field, I just click on the Feeds Node Processor content type and add it. If I forget to set the Feeds Node Processor and I'm trying to create mappings to fields that don't exist, on that screen I'll see that the type was still set to the default of Article.

sergiod’s picture

Hi!
Very good tutorial, and module :)
It's working on my machine, but I need to filter results.
My xml is something like this:

<category href="mysite.com" score="0.3333333333333333">
 <id>sicurezza_alimentare</id>
 <label lang="it">sicurezza alimentare</label>
</category>

I take the value of "label" and I update my drupal taxonomy (a "tag" taxonomy).
The problem is that I need to take this value ONLY if the "score" number is less than X.
Is it possible?

Thanks,
Sergej

aenw’s picture

The FAQ page later in this section says that you can use a XPath query ([#980380]).
(I'm not that familiar with Feeds XPath yet, but I am somewhat familiar with XPath in general.)

Alex777’s picture

If understood correctly it is possible, check here:
http://www.w3schools.com/xpath/xpath_syntax.asp

aenw’s picture

I don't think "salary/[@ccy="usd"]" is a valid XPath expression. I get an error message saying it's not valid, and when I test it with other XPath expression testers/validators on the web, they also report it as invalid.

Shouldn't the expression have the "/" removed? So the expression should be salary[@ccy="usd"] ?

Has anyone been able to get that expression to work?

- aenw

ULuvLucy’s picture

You are correct, the "/" in this tutorial should not be there for the example given.

medden’s picture

I have removed the / after the salary. Thanks for pointing it out and checking.

patcon’s picture

In case this helps anyone debug or test this tutorial, here's the xml file at a gist url:

https://raw.github.com/gist/1444186/0d3fa20f15b5883ab72aa4916c6bc4a83914...

eod696’s picture

I've got feeds and XPath Parser, but I can't generate a feed like that from the content-types I've created. The only kind of XML feed output I can get Drupal to give is a standard RSS article which looks like:

<item>
    <title></title>
    <link></link>
    <description></description>
    <pubDate></pubDate>
    <dc:creator></dc:creator>
    <guid isPermaLink=""></guid>
</item>

How is the custom XML feed in this tutorial generated?

mephiszto’s picture

Hi,

I am quite new to Drupal and XPath. I need to import some data from the following xml file:

<?xml version="1.0"?>
<order id="invoice 1">    
    <detail>
		<type>apple</type>
        <quantity>10</quantity>
        <price>20</price>
    </detail>
	<detail>
		<type>banana</type>
        <quantity>5</quantity>
        <price>3</price>
    </detail>
</order>

I am using the following mapping:
Context: //order
title: @id
field_type: detail/type
field_quantity: detail/quantity
field_price: detail/price
field_full_price: detail/quantity * detail/price

I am getting the following output:

invoice 1

type:
apple
banana
quantity:
10
5
price:
20.00
3.00
full price:
200.00

My 1st problem is, that I dont get the full price of the second product.
My 2nd problem is, that I dont know if its possible to get an output formatted like this:

type:..........quantity:..........price:..........full price:
apple...............10...............20...............200
banana...............5...............3...............15

(like a table or similar look)

My 3rd problem is, that I would need a field, which would represent the sum of full prices (in this case, it should be 215), but I dont know how the XPath query should look.

I would really appreciate any help. Thank you !

khalemi’s picture

thanks, this works for me.

ashoktcr’s picture

I have successfully created the XML feeds (content type = job), but I have found one problem that the XML tags are also appending to data, ie title, description and salary tags are also there
The output is following

<title>Drupal Developer</title>

<description>A job creating applications with Drupal.</description>

salary: 
<salary ccy="usd">60,000</salary>

How to get the proper output ?
Thanks is advance...

ashoktcr’s picture

I get the answer by this tutorial, just uncheck the check-box under "Select the queries you would like to return raw XML or HTML" in XPath XML parser settings

stomerfull’s picture

Thank you very much it works but what about multilangue option because i have a site in multilangue

and have follow some setting for feeds to work in multilangue site :

i have installing these module :
http://drupal.org/project/title
http://drupal.org/project/feeds_et

and follow these instructions for entity translation : http://drupal.org/node/1280910

but i am not able to get it working

no error appeared and no node created

thank you for your help

Matt Tews’s picture

This is working fine for me. Thank you for the tutorial. I am literally doing a Job Listing area updated from an external XML feed, so this is perfect.

I see you have Scheduler listed as an included module. I have the module installed. Question: How do I schedule the import?

I see you have "There are two paths to trigger the creation of a Feed Processor Node" but I am looking to have this update automatically a few times per day.

Thank you for your help!

eliaspallanzani’s picture

I followed this instructions step by step but the result is a blank page :(.
(solved, i disabilited other rule that conflict with importer)

pkosenko’s picture

I am having trouble figuring out what the XPath parser is supposed to be doing.

I THOUGHT that it would import the CONTENT (TEXT) of a node that is selected, but it imports the full XML node with the XML tags wrapped around it. And the xpathparser:0 style variables for the labels of the fields.

Then I deleted that feed and recreated a newer and simpler one and THAT worked.

So SOMETHING happened that forced the Feed to misread the SOURCE items as TARGET and display SOURCE?

Bizarre. If it happens again, I will try to track it down here.