This project is not covered by Drupal’s security advisory policy.

Drupal 10/11 module that extends the Feeds module with a Paginated HTTP Fetcher — a fetcher plugin that automatically walks through every page of a paginated API endpoint and delivers the combined result to the standard Feeds parse/process pipeline.
Standard Feeds HTTP fetcher retrieves a single URL and hands the raw response to the parser. This module replaces that single-request fetch with a loop that:

  • Fetches page 1 of the API.
  • Extracts the items array from the response (configurable).
  • Determines whether there is a next page (using one of four strategies — see below).
  • Repeats until there are no more pages, or a configured page limit is reached.
  • Merges all collected items into a single JSON array.
  • Returns that merged array to the Feeds pipeline as a RawFetcherResult.
  • The parser and processor configured on the feed type receive the merged data exactly as if the entire dataset had come from one page

Features

Pagination Strategies

  • Page number — increments a query parameter (e.g. ?page=1&per_page=100); configurable parameter name and starting value (0 or 1)
  • Offset — increments an offset parameter (e.g. ?offset=0&limit=100); configurable offset and limit parameter names
  • Link header (RFC 5988) — follows rel="next" from HTTP Link response headers
  • JSON next link — reads the next-page URL from a dot-notation path inside the JSON response body (e.g. links.next, pagination.next_url)

Item Extraction

  • Items key — dot-notation path to extract the items array from a response wrapper (e.g. data, results.items)
  • Root-level JSON array support — when items key is empty, uses the response array directly
  • Single-object wrapping — a bare JSON object response is automatically wrapped in an array

Batch Mode

  • Pages per batch — spreads fetching across multiple cron runs; persists resumption state between runs
  • Resumes correctly from the exact page, base URL, and current URL on the next cron run
  • Signals Feeds with setCompleted() when all pages are done

Memory & Execution Time Protection

  • Streaming temp file accumulation — writes items to a PHP tmpfile() page-by-page instead of accumulating in a PHP array, reducing peak memory usage
  • Memory threshold — stops the current batch early and saves state if PHP memory usage exceeds a configurable % of memory_limit (default 80%); skipped when memory_limit = -1
  • Execution time threshold — stops early if elapsed time exceeds a configurable % of max_execution_time (default 80%); skipped when max_execution_time = 0
  • Both thresholds log a warning and persist state so the next cron run resumes without data loss

Resilience & Retry

  • Retry on transient failures — configurable retry count (default 3) with exponential backoff
  • Retries on: connection errors (ConnectException), HTTP 5xx (ServerException), HTTP 429 rate limit (ClientException 429)
  • Retry-After header support — respects the server-supplied delay on 429 responses
  • Non-retryable errors (4xx other than 429, invalid JSON) fail immediately
  • Each retry attempt is always logged as a warning regardless of verbose logging setting

Request Configuration

  • Request timeout — per-request read timeout in seconds (default 30)
  • Connection timeout — separate Guzzle connect timeout (default 10), independent of the read timeout
  • Extra query parameters — static URL-encoded params appended to every request (e.g. api_key=abc&format=json)
  • Custom request headers — one Name: Value per line (e.g. Authorization: Bearer token)

Safety & Security

  • Maximum pages — hard cap on total pages fetched per import run (0 = unlimited)
  • SSRF protection — server-supplied next-page URLs (Link headers, JSON next link) are validated; only http:// and https:// schemes accepted
  • HTTP header injection prevention — CR/LF characters stripped from all custom header names and values

Logging

  • Verbose import logging — per-feed toggle; logs page URLs, item counts, pagination decisions, batch state changes, and resource threshold triggers
  • Always-on error logging — HTTP failures, invalid JSON, non-array responses, and retry attempts are always written to the feeds_paginated_fetcher watchdog channel regardless of the verbose flag

UI & Configuration

  • Per-feed configuration form with conditional field visibility (strategy-specific fields shown/hidden via #states)
  • Settings placed in a Pagination settings vertical tab alongside Feeds' built-in tabs
  • Form validation for URL scheme, extra query params format, per-page minimum, next-link path requirement, and percentage field ranges
  • All settings have sensible defaults; existing feeds without new config keys automatically receive defaults

Compatibility

  • Drupal 10 and 11
  • Requires the Feeds 3.x contrib module
  • PHP 8.3+

Post-Installation

Follow README.md for all post installation configuration with example

Supporting organizations: 

Project information

Releases