Last updated December 13, 2013. Created by drunken monkey on August 19, 2011.
Edited by jhedstrom, balintk. Log in to edit this page.

This page lists and shortly explains all data alterations and processors currently available. Unless otherwise noted, they are part of the core Search API module.

Note that not all data alterations and processors might be available for a certain index. This is usually based on the index's item type. For instance, the Bundle filter data alteration isn't available for indexes on item types which don't define any bundles (or, only a single one).

Data alterations

Bundle filter
Lets you to prevent entities from being indexed based on their bundle (content type for nodes, vocabulary for taxonomy terms, etc.). This way you can, for instance, create an index solely for news.
Language control
Allows you to control the language of items stored in the index. This is done by providing two different functionalities:
  • Normally, the content of the Item language property (which is automatically added by the Search API for all indexed items) is determined by the item's language property, if available, and otherwise set to undefined. With this data alteration, you can select any other property as an alternative source for the item language, which will then be used instead. Note that the selected field has to contain a single valid ISO language code for each item for this to work, though.
  • You can then also select the languages items in this index may have. Items with any other language (defined by the Item language property) will be rejected during indexing.
Node access
Adds node access checks to searches on this index. This is done by adding a new field, Node access information that stores the relevant access data. When the Node access information, author, and Status fields are present and indexed, appropriate filters will be automatically added to all searches so that they only return results that the current user is allowed to view. Some searches (e.g., search views) provide the option to override this behaviour on a per-search basis, though. Check the corresponding module's documentation for details.
In any case, you have to keep in mind that these access checks are solely based on the indexed data. If a node is edited in a way that changes its accessibility (e.g., by being unpublished), this change will only take effect once the node is indexed in its latest state. This means that there is potentially a gap between changing the node and the update of the access checks on search results, meaning that—depending on the data displayed for search results—users could in that time see data that should not be accessible to them. If you need to avoid that, use the index's Index items immediately option.
Also note that access on the individual fields is never checked — don't include them in the display, if they contain sensitive data.
The data alteration is only available for node indexes.
URL field
Adds a field containing the URL at which the entity can be displayed. For some item types, like nodes, this URL is already available, but this data alteration can be used to also add them for other types.
Aggregated fields
Offers the ability to add additional fields to the entity, containing the data from one or more other fields. Use this, e.g., to have a single field containing all data that should be searchable, or to make the text from a string field, like a taxonomy term, also fulltext-searchable.
The type of aggregation can be selected from a set of values: you can, e.g., collect the text data of all contained fields, or add them up, count their values, etc.
Complete entity view
Adds a field containing the whole HTML content of the entity as it is viewed on the site. The view mode used can be selected. This allows you to index exactly „what the user sees“, which is often what is expected, but might differ from just indexing the contents of other fields.
Note that this might not work for items of all types. All core entity types except files are supported, though.
Index hierarchy
Allows you to index hierarchical fields along with all their parents. Most importantly, this can be used to index taxonomy term references along with all parent terms. This way, when an item, e.g., has the term New York, it will also be matched when filtering for USA or North America.

Processors

Ignore case
Makes searches on selected fields case-insensitive. Some servers might do this automatically, for all others this should probably always be activated, at least for fulltext fields.
HTML filter
Strips HTML tags from selected fields and decodes HTML entities. If you are indexing HTML content (like node bodies) and the search server doesn't handle HTML on its own, this should be activated to avoid indexing HTML tags, as well as to give e.g. terms appearing in a heading a higher boost.
Tokenizer
This processor allows you to specify how indexed fulltext content is split into seperate tokens – which characters are ignored and which treated as white-space that seperates words.
Stopwords
Enables the admin to specify a stopwords file, the words contained in which will be filtered out of the text data indexed. This can be used to exclude too common words from indexing, for servers not supporting this natively.

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.