Elasticsearch AI VDB Provider

An Elasticsearch backend for the Drupal AI module and its AI Search submodule. Stores and retrieves embeddings using Elasticsearch's native dense_vector field with HNSW kNN, so semantic search and RAG work directly against your existing Elastic Stack — no separate vector database to operate.

Why this module

As of April 2026, this is the only Drupal AI VDB provider that runs hybrid search (kNN + BM25) in a single Elasticsearch request, merged with Reciprocal Rank Fusion (RRF).

Capability This module Other VDB providers
Approximate kNN (HNSW) Yes Yes
Access-control pre-filtering Yes Yes
Hybrid search (kNN + BM25 + RRF) Yes No
Reuses an existing Elastic Stack Yes No

Requirements

  • Drupal 10.3 or 11
  • AI module (includes the AI Search submodule)
  • Key module for credential storage
  • An Elasticsearch 8.x cluster with dense_vector support — 8.8+ for hybrid search (RRF)
  • An AI Provider module for embeddings (OpenAI, Anthropic, Ollama, etc.)

Install

composer require drupal/ai_vdb_provider_elasticsearch
drush en ai_vdb_provider_elasticsearch

Configure

  1. Store credentials. At /admin/config/system/keys, create a Key entity holding either an Elasticsearch API key (recommended) or a Basic Auth password. Skip for unauthenticated local clusters.
  2. Configure the provider. At /admin/config/ai/vdb_providers/elasticsearch, set the host URL, point at your Key entity, choose an Index Prefix (e.g. drupal_) and a Similarity Metric (cosine for normalized embeddings from commercial LLMs — the safe default).
  3. Wire up Search API. At /admin/config/search/search-api, add a Server with AI Search (VDB) as the backend, select Elasticsearch as the VDB Provider, attach your AI Provider for embeddings, then add an Index and run it.

The Elasticsearch index is created automatically on first use. The similarity metric and embedding dimensions are baked into the mapping at creation time — changing either requires dropping and re-indexing.

Pure vector search is excellent at conceptual matching but fails on exact lexical lookups — product SKUs, acronyms, proper names. With hybrid search enabled, every query runs both a kNN vector search and a BM25 keyword match in one Elasticsearch request, then merges the two ranked lists with RRF. No manual weight tuning needed.

Content profile Recommendation
Mostly narrative prose, queried by concept Pure kNN (hybrid off)
Codes, acronyms, proper names mixed in Hybrid on
RAG agent over a mixed knowledge base Hybrid on
Elasticsearch < 8.8 Hybrid not available

License note. RRF is a commercial Elastic feature on 8.x. The basic (free) license returns 403 license non-compliant for [Reciprocal Rank Fusion (RRF)]; either start a Platinum trial or leave the toggle off. Pure kNN works on the basic license.

Documentation

Detailed walkthroughs live in the project repository:

Index field mapping

The module maps these fields explicitly at index creation; everything else ai_search sends through is accepted via dynamic: true.

Field Type Purpose
vector dense_vector Embedding (HNSW-indexed)
entity_id keyword Drupal entity ID
entity_type keyword Pre-filter (e.g. node)
bundle keyword Bundle pre-filter
langcode keyword Language pre-filter
chunk_id keyword Chunk identifier within an entity
content text Plain text used by BM25 in hybrid mode

Maintainers

  • Ricardo Amaro
  • Looking for co-maintainers — open an issue or reach out.

Project information

Releases