Document Loader provides a consistent, plugin-driven API for ingesting documents from multiple sources and normalizing them into reusable output formats. It standardizes discovery, configuration, and execution of document loaders so other modules can focus on their own features instead of wiring bespoke ingestion logic.

Features

  • Attribute-based plugins for registering loaders and loader types
  • Runtime discovery via Drupal plugin managers with cache support
  • Configurable defaults that map loader types to concrete loader plugins
  • Common input/output interfaces to keep transport details decoupled
  • Reusable input/output types for standard formats (JSON, CSV, Markdown, etc)
  • Integration points within the Drupal user interface (MDXEditor, Field Widget Actions, etc)

Installation

  1. Install the module as you normally would
  2. Install one of the below recommended modules to ensure a Document Loader plugin is available
  3. Visit admin/config/media/document-loader to test out loading your own documents

Recommended Modules

Module Document Types
PDF Parser PDF
Webpage Remote Web Pages
PHPWord Word, ODT, RTF
AI File To Text Word, Spreadsheet, CSV, Text, Markdown, leveraging AI module
Plugin API HTTP/HTTPS API Calls
HTML Processor Web Pages
Parquet Parquet
AI Simple PDF To Text Deprecated by AI File To Text
Supporting organizations: 
Development
Development
Development

Project information

Releases