Basic architecture and component interaction

Last updated on

1 July 2017

Drupal 7 will no longer be supported after January 5, 2025. Learn more and find resources for Drupal 7 sites

This page describes the Search API's basic architecture, its different components and how they interact.

A short description of most components is available in the Glossary. Even though those descriptions are more aimed at new users of the module, not developers, they should give a good impression of what they generally do.
One component missing there, since it doesn't concern normal users of the module at all, is the datasource controller:

Datasource controller: Datasource controllers provide definitions of and functions related to item types. They control how items of a certain type are loaded and keep track of what items there are of a certain type. Each item type is associated with exactly one datasource controller. E.g., included with the Search API is the entity datasource controller, which provides all entity types as item types.

For more detailed descriptions of the components, see the other pages in this section. In the following, this page will provide an overview of how the components interact in achieving certain use cases, as well as a general overview of the architecture.

Basic architecture

As you can see, the index acts as the primary interface to modules outside of the Search API. These modules will usually only use data from the index, not, e.g., from the server directly. This means that functionality will (largely) stay the same when switching an index to a different server.

Upon creation of the index, the item type is selected and stays the same over the entire lifetime of the index. Each item type is provided by a certain datasource controller, which is responsible for loading items of that type, providing metadata for them, keeping track of which items exist of that type, etc. When the index needs to know about any of these things, it contacts the datasource controller, passing the item type in question.

The choice of the other components associated with an index can be changed at any time.
Each index is associated with a server, or with none at all. If it isn't associated with (“lies on”) a server, the index cannot be used, as the server is what's being used for actual data storage and retrieval. You can still change most of the configuration, including for most third-party modules, when the index is not on any server (or otherwise disabled), though.

Furthermore, an index can use any number of data alterations and processors, which can change certain aspects of the indexing and searching workflows. Each of these can be configured on a per-index basis.

A server represents a certain way of indexing data, consisting of the code implementing the functionality and settings provided by the user. The service class associated with the server is what's provides the code and defines the available settings. It is chosen upon creation of the server and is, like the index's item type, not changeable.

Retrieving item metadata

Retrieving item metadata consists of getting a description of all available (or all indexed) fields on a certain index. It's therefore done with a method on the index, getFields().

The index then uses the datasource controller of its item type to get an Entity API metadata wrapper describing the structure of an item of that type. Since data alterations can alter the items' structure (add fields, remove fields or change the type of fields), the data alterations enabled for the index are passed to the datasource controller as well and are then also called to change the metadata accordingly.
The index then uses the returned, modified metadata wrapper to construct an array of all fields with the necessary information. (This data is also cached persistently, to speed up subsequent calls for larger sites.)

Indexing

Firstly, the datasource controller's getChangedItems() method is used to determine which items need to be indexed for a certain index. Its loadItems() method is then used to load these items, which are then passed to the indexing which takes control of the whole workflow.

First, the items are passed to the data alterations enabled for the index, so they can change the available fields on the items. At this point the index also adds the search_api_language field to the items, which is guaranteed to be available for all items. In the next step, the index then extracts the raw field data from the altered items.
This data is then passed to all of the index's enabled processors, which further modify the data. (These cannot add or remove fields or items anymore, though.)
Finally, the thus preprocessed data is passed to the server's indexItems() method where the actual indexing happens, in a way specific to the used service class. The successfully indexed items are then reported back, so the datasource controller can mark them as indexed.

Searching

For executing a search, you first need a SearchApiQueryInterface object. This is usually done by calling the search_api_query() function, which will automatically load the passed index and create a query object for it. (As the diagram shows, the service class is really the one responsible for creating the query object. However, this can be viewed as just an oddity – it's usually completely irrelevant, as all known service classes use just the standard implementation. This additional redirection will also most likely be removed in future versions of the API.)

Once you have a query object, it can be used very much like a database select object. The methods available are documented in detail in includes/query.inc as doxygen documentation. (Also available at drupalcontrib.org in a formatted form.)
Common methods you would use here are: keys() to set the search keys (the way these are passed are determined by the parse mode option); condition() (multiple times) to apply additional filters on the query; filter() for adding more complex filters (e.g., with OR conjunction) – use createFilter() first to get the filter object; sort() to determine the search's sort; range() to limit the number of results returned (and/or apply an offset); and setOption() to set additional options for the queries (e.g., facets or spellchecking are applied this way).

Once the query is configured as desired, use its execute() method to execute the search. This will preprocess the query object (among others by invoking all enabled processors) and then pass it to the service class' search() method, where the actual searching takes place. The results are then returned in the format specified by SearchApiQueryInterface::execute() to the query object, where the processors are once again invoked to post-process the search results.

(If you'd like to improve any of these diagrams, please contact drunken monkey for the underlying Dia files.)