Community Documentation

Feed Import

Last updated February 16, 2013. Created by Sorin Sarca on December 3, 2011.
Edited by Gys, shamio, penomosa. Log in to edit this page.

Feed Import is a module that allows you to import content into entities from various file types (like XML, HTML, CSV).
In this page you'll find a general way to use feed import and examples on how to build your feed configuration in order to import a XML file, HTML page or CSV file.

If you want to see list of provided filters by feed import check this page.

Using Feed Import

First, navigate to admin/config/services/feed_import. You can change global settings using "Settings" link. To add a new feed click "Add new feed" link and fill the form with desired data.
After you saved feed click "Edit" link from operations column. You can change process function and it's settings in PROCESS FUNCTION SETTINGS fildset. Now at the bottom is a fieldset with XPATH settings. Add XPATH for required item parent and unique id (you can now save feed). To add a new field choose one from "Add new field" select and click "Add selected field" button. A fieldset with field settings appeared and you can enter xpath(s) and default action/value. If you wish you can add another field and when you are done click "Save feed" submit button.
Check if all fields are OK. If you want to (pre)filter values select "Edit (pre)filter" tab. You can see a fieldset for each selected field. Click "Add new filter" button for desired field to add a new filter. Enter unique filter name per field (this can be anything that let you quickly identify filter), enter function name (any php function, even static functions ClassName::functionName) and enter parameters for function, one per line. To send field value as parameter enter [field] in line. There are some static filter functions in feed_import_filter.inc.php file >> class FeedImportFilter that you can use. Please take a look. I'll add more soon.
If you want to change [field] with something else go to Settings. You can add/remove any filters you want but don't forget to click "Save filters" submit button to save all.

Now you can enable feed and test it.

XML example

We will use as XML file an example from w3schools: http://www.w3schools.com/xml/cd_catalog.xml.

1- First we create a new content type (admin/structure/types/add) named "Music collection" (with machine name: music_collection) and add some fields to it:

  • field_artist (text)
  • field_country (text)
  • field_year (integer)
  • field_price (float)

- Save this content type.

2- Go to Feed Import (admin/config/services/feed_import).
2.1- Add a new feed named "Music collection feed".
2.2- Select entity name "node".
2.3- Set xml url to "http://www.w3schools.com/xml/cd_catalog.xml" (without quotes).
2.4- Select "processXML" for process function and click on "Add feed" button to save the new feed.

3- Click on Edit link to edit this newer feed.
3.1- Scroll down to the bottom of the page at XPATH SETTINGS and type "//CD" (without quotes) to set the "ENTER PARENT ITEM XPATH" field.
3.2- Type "TITLE" (without quotes) to set the "Enter XPATH TO UNIQUE IDENTIFIER OF ITEM" field.
3.3- On the "SELECT DEFINED FIELD" scroll down menu, choose "type" and click on "Add field" button.
3.4- Leave the XPATH textarea blank and enter "music_collection" (without quotes) as default value to set the node type where we want to import. Note 1: filling the 'type' field is mandatory ! Note 2: If you are using "Rubik" theme as your admin theme, you won't see the "default value" field. Switching back to the core "Seven" admin theme is recommended.
3.5- Add more fields with xpaths (fieldname: xpath_value - replace "fieldname" with the machine-name of your Content Type field, and "xpath_value" with the element value in your XML source):
For instance:

  • title: TITLE
  • field_artist: ARTIST
  • field_country: COUNTRY
  • field_year: YEAR
  • field_price: PRICE

You can also add a default value for each field (for title you can add "Unknown song", for artist "Unknown artist" and so on).
If you can't find some fields to add in select, please Clear cache and refresh (don't forget to save)
Note: For this example we don't use pre-filters only a filter.
- Go to EDIT FILTERS tab and click on FIELD_PRICE.
- Now click "Add new filter ... ", in filter name insert "Round price" (without quotes), in function name insert "round" (without quotes) and in function params insert on first line "[field]" (without quotes).
If you take a look at round() function you see that you can enter a second parameter for precision. Now don't enter a second parameter and save filters by pressing "Save filters" button.
Feed is ready to be processed. Go back to admin/config/services/feed_import and click "Process" operation for this feed. A success status should appear. To view imported content go to admin/content and filter by type "Music collection". Check all content to see that every item has rounded price (comparing to xml file).
Go back to filters to field price and add second parameter (on second line) for function round with value of 1. Save and process import again. You can notice that items were updated and not doubled and price is now other.

Even more...
But this isn't enough... Now we want to tag these nodes with Country and Artist so we can filter nodes.
To do that first go to manage fields for our content type and add existing field field_tags. Now go back to feed import to edit our feed configuration and add field_tags field. Set XPATH to COUNTRY|ARTIST (this will select both country and artist) and select for action when result is empty: "Ignore this field". Save feed and go to EDIT FILTERS tab. Add a new filter for field_tags with name: "Tag with country and artist" , function: "FeedImportFilter::setTaxonomyTerms" and first parameter (on first line) is "[field]" and second (on second line) is "Tags" (all without quotes). Save filter and process feed. You'll see now that all nodes are tagged and you can filter by country or artist.
Note: If you don't want to add new terms to vocabulary use FeedImportFilter::getTaxonomyIdByName instead of FeedImportFilter::setTaxonomyTerms function.

HTML example

For this example we will use as HTML page the Drupal Commerce demo site http://demo.commerceguys.com/dc/ to extract products from first page (hope they don't mind for "stolen" products).
First we have to create a new content type to store imported products. Create one named "Commerce products" with machine name: commerce_products. Add the following fields:

  • field_image (Image)
  • field_price (Float)

Now go to admin/config/services/feed_import and add new feed named "Commerce products", entity name: node, url: http://demo.commerceguys.com/dc/ and for process function select processHTMLPage.
After saving this feed go and edit it. Scroll down to XPATH SETTINGS and for parent xpath use:
//div[@id='content']//article[@id]/div
Do not use anything for unique xpath because this will be an one-time import.
Now add the following fields:

  • type
    do not use any xpath for this but enter a default value: commerce_products
  • title
    use for xpath header/h1/a and for action if empty "Skip importing this item" because we don't want to import content without title.
  • body
    use for xpath div[@class='article-content']/div[2]//p[1] and for action if empty "Skip importing this item" because we don't want to import content without body.
  • field_price
    use for xpath div[@class='article-content']/div[3]/section/div[@class='field-items']/div and for action if empty "Skip importing this item" because we want only complete products.
  • field_image
    use for xpath div[@class='article-content']/div[1]//img/@src and for action if empty "Ignore this field".
  • uid
    don't use any xpath for this just enter default value 1 (this node will appear as added by admin but you can ignore this field)
  • promote
    don't use any xpath for this just add default value 1 so this will be promoted to front page (can be omitted)

Ok, now save this because we have to add filters.
Go to filters because we have to extract product image to add it in field_image. We can do this with filters from FeedImportFilter class.
Because we used for xpath a property we have to extract value from it.
Add a filter named "Get src attribute" with function name "FeedImportFilter::getProperty" and two params: [field] and src.
Now we have url to image, but the image is a thumbnail and is useless. We need a large image and for this we will use a simple hack.
Add new filter named "Get large image not thumbnail" with function name "FeedImportFilter::replace" and three params: [field], /thumbnail/ and /large/
Now we have turned a url like

http://demo.commerceguys.com/dc/sites/default/files/styles/thumbnail/public/field/image/Black_Mug.jpg

which contains a thumbnail image into one like

http://demo.commerceguys.com/dc/sites/default/files/styles/large/public/field/image/Black_Mug.jpg

which contains a large image, and that's what we wanted to do.
Finally we have to save image to field.
Add new filter named "Save image" with function name "FeedImportFilter::saveImage" and three params: [field] and public://field/image/ and 1 (last param is for replacing existing file, since v2.6)

For field_price we have to filter data like $10.00 to be a float like 10.00. A simple replace should be enough.
Add a filter for filed_price named "Remove dollar sign" with function name "FeedImportFilter::replace" and two params: [field] and $
For body we use for filter "FeedImportFilter::join" and one param: [field] because some descriptions have more than one paragraph.
Don't forget to save filters because we are ready for import process.
Process import and go to admin/content and you should see all imported products. Click to see they have all fields including images.

CSV example

For this example we will create a CSV file containing three columns: User, Mail and Pass.

User,Mail,Pass
user1,user1@example.com,123456
john,john23@example.com,secretpassword
smith,smth@example.com,youllneverknowthis

Save this file on localhost (or elsewhere) so it can be accessed from an url.
This users will be imported to user entity.
Go to admin/config/services/feed_import and create a new feed named "Users from CSV", entity name "user", process function "processCSV" and url to your CSV file.
Now go to edit it. In PROCESS FUNCTION SETTINGS fieldset set the "Use column names" setting to value of 1.
Go to XPATH SETTINGS and for parent xpath use //row and leave empty unique xpath field.
Add the following fields:

  • name
    with xpath column[@name="User"]
  • mail and init
    with xpath column[@name="Mail"]
  • pass
    with xpath column[@name="Pass"]
  • status
    with no xpath and default value set to 1.

and save feed. Because user password is hashed we have to hash it too.
So go to filters tab and add a filter for pass named "Hash password" with function name "FeedImportFilter::userHashPassword" and one param: [field]
Save filters and go to process import. Check new users navigating to admin/people. If they don't appear then check reports for possible errors.
If in reports you find errors like "Undefined index: mail" in user_save() then you must to update user module.
You can now log in with an imported user and password.

Comments

problems with XML import

I am trying to follow the XML example, but I am getting the error below when running cron using Drush and there is none new content added to the Music Collection Content type:

SimpleXMLElement::xpath(): Invalid
expression2402/var/www/HPL-Stats/sites/all/modules/feed_import/feed_import.inc.php

Do you know what does it mean?
Thanks

Pedro Mosquera

Please post an issue

Means that your xpath expression is invalid. Please post an issue (support request) where we can help you! Don't forget to specify your xpaths.

Correction to CSV example

instructions read:

FeedImportFilter::hashUserPassword

should be:

FeedImportFilter::userHashPassword

drupal n00b in training

You are right, thanks. You

You are right, thanks. You can also edit this page if you find other mistakes rather than just commenting.

Add new field_image to XML example

For example, I need to add a new field with images of the artists, without downloading it on my web site, simply show this image from URL:

1. I save xml file on my web site "cd_catalog.xml" from XML example "http://www.w3schools.com/xml/cd_catalog.xml" and add to the "cd_catalog.xml" new field (for showing images from URL without downloading):

"IMAGE"http://demo.commerceguys.com/dc/sites/default/files/styles/thumbnail/public/field/image/Black_Mug.jpg"/IMAGE"

2. add to the Сontent type "Music collection" new field:

"field_image (Long text and summary, Text area with a summary)"
with parameters "FULL HTML"

3. add to the Feed Import "Music collection feed" new field with xpath:

"field_image: IMAGE"

with parameters Action when filtered result is empty "Provide a default value" and blank textarea Default value

4. click "Process" (admin/config/services/feed_import) and I have all nodes from "cd_catalog.xml", here is one node example:

Artist:
Bonnie Tyler
Country:
UK
Year:
1 988
Price:
9.90.
Image:
http://demo.commerceguys.com/dc/sites/default/files/styles/thumbnail/pub...

=> Field "Image" show me only clickable URL where is image, when click this URL, I see, in new window, this image.
But I dont want see clickable URL, I need see image of the artist in the field "Image".

=> HOW DO IT?

Settings for twitter import

Twitter JSON has metadata fields and nests multiple items (each of which is a tweet that will become a node) in a field named "results." After some experimentation, these are the settings I ended up using:

1. URL to feed I used to experiment (search, not stream): http://search.twitter.com/search.json?q=%40twitterapi

2. Process function settings: <?xml version="1.0" encoding="utf-8"?><root/>

3. Parent item xpath: /root/results -- grabs the twitter fields I want

4. XPATH for each field, like: id_str/text() -- didn't grab data without text().

It seems that the SimpleXML object is like a root with no tag, and the fields are its immediate children, so id_str/text() works (and not /id_str/text()).

problem about XML import

I want to kown how to use feed import, I finished all steps of XML example, no error appeared, but no new connect was added, I can't find anything ,who know why ,please give me a hand ,thanks.

hello every body, I finished

hello every body,
I finished too all steps of XML example , no error appeared , new content was added with empty value,
what am i doing wrong?

thank you very much for your help

Post an Issue

If you want support for use of this module, submit an issue to the issue queue. Support questions probably won't be answered if they appear in comments to a handbook page.

HAJ

Solve empty value content

There are two causes to get empty values so far (after my investigations):

  1. the value for XPATH doesn't have to be an association between field machineName and xpath, has to be only the xpath. For example XPATH value for ARTIST is "ARTIST" (without quotation marks) not "field_artist: ARTIST"
  2. if you are using php 5.4 the xpath() method that is used inside the feeder doesn't work properly, so you have to downgrade the php to 5.3. Someone already reported the issue here.

Hope this helps!

Page status

About this page

Drupal version
Drupal 7.x
Audience
Programmers, Site administrators
Level
Intermediate
Keywords
Content, feed, import
Drupal’s online documentation is © 2000-2013 by the individual contributors and can be used in accordance with the Creative Commons License, Attribution-ShareAlike 2.0. PHP code is distributed under the GNU General Public License. Comments on documentation pages are used to improve content and then deleted.