Community Documentation

The site builder's guide to Feeds

Last updated January 3, 2012. Created by alex_b on November 4, 2009.
Edited by agentolivia, pianodavid, jutulen, Max_Headroom. Log in to edit this page.

Introduction

Feeds is designed to address import and aggregation use cases. It provides a UI for creating and managing multiple configurations for importing and aggregating simultaneously.

A single configuration for importing is called an Importer. As many importers as desired can be created. Each importer contains a Fetcher for downloading a feed, a Parser for parsing and a Processor for "doing stuff" with it - usually storing the feed.

Default configurations

When you install Feeds and go to Administer > Site building > Feed importers (D7: admin/structure/feeds), you will find 4 default importer configurations (5 if Data module is installed):

  • Feed: aggregation importer. Aggregates RSS/Atom feeds to nodes. Provides a node type Feed and a node type Feed item. Create one or more "Feed" nodes to add RSS/Atom feeds to your site. On cron, these feeds will continuously produce "Feed item" nodes. Requires the Aggregator core module to be enabled.
  • OPML import: import an OPML file and create Feed nodes from its entries. This configuration should be used together with the "Feed" configuration. To use this importer go to http://www.example.com/import.
  • Node import: import nodes from a CSV file. To use this importer go to http://www.example.com/import.
  • User import: import users from a CSV file. To use this importer go to http://www.example.com/import.
  • Fast feed (only available if Data module is installed): similar to "Feed" configuration, with the difference that this "Fast feed" creates simple database records from feed items.

To use any of these default importers, simply activate them by checking the "enabled" box.

Creating an importer configuration

Of course if the default importers don't fit your use case, you can modify them (click "override"), copy them (click "clone") or you can start from scratch (click "New importer").

Here is a short run down on how to create your own importer. Copying or modifying an existing one is very similar.

  1. Go to admin/build/feeds, click "New importer"
  2. Add a name and a description
  3. Click "create", now you will be kicked over to the importer's configuration page. From here on out, modifying/copying an existing importer or configuring your new importer works essentially the same way.
  4. Go to "Basic settings". Decide whether the importer should be used on a standalone form or by creating a node ("Attached to content type"); decide whether the importer should periodically refresh the feed and in what time interval it should do that ("Minimum refresh period").
  5. Click "Change" next to "Fetcher" and pick a suitable fetcher for your job. Do the same for "Parser" and "Processor"
  6. Review the settings of each fetcher, parser and processor and adjust them to your job's requirements.
  7. On "Processor" click on "Mapping": define which elements of the feed ("Sources", e. g. the published date of a feed item) should be mapped to which elements of the Drupal entities ("Targets" - e. g. a node type's fields). There is a Legend on the bottom of the mapping page, it explains the available mapping sources and targets. This step is mandatory and if omitted, will result in empty entities.

Read more in Creating/editing importers

Use the glossary

Confused by the terminology? Take a look at the Feeds glossary to get an overview of the terminology in Feeds.

Requirements and Installation

Install like any other Drupal module. If you install for the first time, make sure you install Feeds, Feeds Admin UI and Feeds Defaults module, all included in the download. Don't forget to configure cron!. Also this will require your PHP to have the CURL library installed (http://drupal.org/node/731918). PHP5-Curl.

Required modules:

Consult the README.txt file included in the module for details on requirements and installation.

Exportables and default hook

Every importer configuration can be exported. Go to admin/build/feeds and click on "export". Copy the exported code and paste it in your module into a hook "hook_feeds_importer_default()".

The export code will populate a variable called $feeds_importer. At the end of the hook, copy $feeds_importer into an export array and return it.

Here is an example:

<?php
/**
* Default definition of 'myimporter'
*/
function mymodule_feeds_importer_default() {
 
$export = array();
 
$feeds_importer = new stdClass;
 
$feeds_importer->disabled = TRUE;
 
$feeds_importer->api_version = 1;
 
$feeds_importer->id = 'myimporter';
 
$feeds_importer->config = array(
 
// ...
 
);
 
$export['myimporter'] = $feeds_importer;
  return
$export;
}
?>

Then, for this hook to be found, it must be declared by your module.

<?php
function mymodule_ctools_plugin_api($module = '', $api = '') {
  if (
$module == "feeds" && $api == "feeds_importer_default") {
   
// The current API version is 1.
   
return array("version" => 1);
  }
}
?>

Alternatively, you can use Features to export Feeds configuration.

Performance

Using Feeds module, how many feeds can be downloaded in what frequency?

Unfortunately, this question is impossible to answer globally. Overall, aggregation performance depends on:

  • Your server's CPU and storage I/O performance.
  • Your server's network connection.
  • The content type being created (complex CCK content type? simple Data record?).
  • The activity of your feeds being processed (many new items per run?).
  • The number of feeds being processed.
  • The parser being used (not as critical as other factors).

Usually, as performance degrades feeds will appear to be stale (no new items present although original feed has been updated a while ago).

The staleness will increase with the number of feeds you add. A good measure of overall aggregation performance is the time difference between the most recently updated feed and the last updated feed:

# my_importer_id is the id of the importer to be examined (can be looked up in feeds_importer table).
SELECT MAX(last) - MIN(last) FROM job_schedule WHERE id = 'my_importer_id';

The result of this query is a time span in seconds. For instance, a result of 3600 would mean that there is 1 hour between the feed that has just been updated and the feed that has not been updated for the longest time of all feeds.

To make sure that results are sane, also compare against current time:

# Watch out: UNIX_TIMESTAMP() returns DB's time which may or may not be the same as in PHP. Use date_part('epoch',now()) if you're on pgsql.
SELECT UNIX_TIMESTAMP() - MIN(last) FROM job_schedule WHERE id = 'my_importer_id';
SELECT UNIX_TIMESTAMP() - MAX(last) FROM job_schedule WHERE id = 'my_importer_id';

Performance: tuning

I experience performance problems, feeds are not updating as often as they should

Here are a some options if you experience performance issues with Feeds:

1. Make sure cron runs often enough, like every 6 minutes.
2. Run cron with drush.
3. Download and install Drupal Queue module, be sure to follow its README file closely to set it up correctly *).
4. Alternatively, use superfeedr http://superfeedr.com as a dedicated pubsubhubbub hub (see Feeds README file).
5. Improve system resources: analyze bottlenecks. Chances are your storage I/O maxes out as heavy aggregation involves a lot of writes. The exact remedies will depending on your findings but could be one or more of these: tune database settings, split out DB to separate server, add RAM to DB server, rearchitect to use a lighter storage model like Data etc.
6. If you are using MySQL, be aware that by nature most of what Feeds does is update data in the database, so these entries will be captured in your binary log. If you are importing large feeds, this means LOTS of log entries in the binary log file(s). Make sure that you have enough disk space for these logs and don't keep them for longer than you need. See the MySQL Binary Log page for more. If you run out of space on your logging drive, your Drupal site will stop working until you fix it.

*) Drupal Queue moves the actual aggregation work to a process separate from cron.php. Thus it is an ideal way to improve performance if other cron jobs like for instance search are already taxing the system. As queues can be worked off concurrently, aggregation speed can be improved considerably. The danger of concurrent aggregation though is that its resource consumption can peak more aggressively and thus lead to high loads that in turn result in a sluggish server.

Page status

About this page

Drupal version
Drupal 6.x, Drupal 7.x
Audience
Developers and coders, Site administrators
Drupal’s online documentation is © 2000-2012 by the individual contributors and can be used in accordance with the Creative Commons License, Attribution-ShareAlike 2.0. PHP code is distributed under the GNU General Public License.