The Solr configuration files packaged with this module are provided in a way to make customizing as easy as possible. The “core files” with the base configuration for the Solr server are schema.xml and solrconfig.xml. These should never be edited directly as they will have to be updated if future versions of the Search API Solr search module changes these files (though this shouldn't be the case too often).

The other files, however, only contain some default settings or only documentation, to help you customize your Solr server. These files will only rarely change, and when they do it should either be unnecessary to update your copies, or trivial to do so. Therefore, you can fill and edit them with custom settings specific to your site's needs. For the format of these files and what you can do with them, see the documentation comments included in them, or the official Solr wiki. The three *_extra*.xml files are included into schema.xml and solrconfig.xml when they are read, thus allowing you to easily add settings to them.

Remember: After changing any configuration, you will always have to restart your Solr server for the changes to take effect!

A few examples for possible customizations follow.

Changing the Solr type of a field

The schema.xml file contains several alternatives for most data types that aren't used by default. For example, for fulltext fields there are text (the default), text_ws, text_und and edge_n2_kw_text; for (long) integers, there are long (used by default), slong and tlong.

If you want to use such a type for one of your indexed fields, it's pretty easy: you first have to find out the internal name Solr uses for the field.
Then just put the following inside of schema_extra_fields.xml:

<fields>
   <field name="FIELD" type="TYPE" indexed="true" stored="true" multiValued="(true|false)" />
</fields>

For the right multiValued (and perhaps other) settings, it's easiest to look inside the schema.xml file for the <dynamicField> declaration with the prefix matching your field, and copy all its settings except for name and type.

So, for example, to change the Solr type of the node's body text to text_ws, use:

<fields>
   <field name="tm_body:value" type="text_ws" indexed="true" stored="true" multiValued="true" termVectors="true" />
</fields>

Sadly, due to restrictions of Solr itself, replacing the Solr type used for a certain Search API data type alltogether is not possible. If you want to do that, you will manually have to change it for all fields of that type – though you can use dynamic fields for at least a little help: e.g., if you want to replace the type for the fields is_comment_count and is_category, you can just use <dynamicField name="is_c*" type="TYPE" … /> (provided there is no other field with that prefix which you don't want to change – which will never be a problem when changing a type completely, though).

Changing the language of a fulltext field

By default, all text fields in Solr will use English stemming. If you want to use stemming for a different language (or other modifications), you'll have to create a new type with these settings and then configure the relevant fields to be indexed with this type. (How the latter is done was already explained above – just add field definitions for some or all fields with the tm_* prefix with your customly added type.)

For adding the custom text type, just copy the definition of the text type in schema.xml to schema_extra_types.xml. The type definition is the block starting with <fieldType name="text" and ending with the next </fieldType> (about 54 lines in total). Then edit the copy in schema_extra_types.xml to your liking.
First, change the identifier (in name="text" right at the beginning) to some other, not already used one – e.g., text_fr for French text. (An example for German is already included in the schema_extra_types.xml file – just remove the comment to use it.) You can use any identifier you like, though, so iwflksxf is also fine.

Then replace the two occurrences of "English" in the definition with the language of your choice – see this Solr wiki page for a list of supported languages.
If you want to use several languages at once on this Solr server, and therefore can't just fill synonyms.txt, protwords.txt, etc., with settings for your language, you can also set new, language-specific files for these default files here. Just replace the respective file names in the definition.

To add more than one type, just copy one or more additional type definitions after the closing </fieldType> of the first one.

Finally, just add the <field> definitions using the new type(s) to schema_extra_fields.xml as described above. Remember to change type="text_ws" to type="text_de", or whatever you use in your schema_extra_types.xml file name field.

Creating a text type for partial matching

(For actually using that type for your fields, again, see above.)

By default, the Solr search module doesn't support partial (or substring) matching. E.g., when searching for "break", items containing "breakpoint" (or "unbreakable") aren't found. This default was selected since it returns more reliable results that don't just contain the search keys by accident, and since it will perform better for larger data sets. Also, stemming already takes care of some of these queries (see also Solr's notes about stemming).
However, on many sites users will expect partial matches to be returned. Luckily, Solr already comes equipped with text analysis tools to easily implement this for your server: the solr.NGramFilterFactory and the solr.EdgeNGramFilterFactory filters. The difference is that, with the latter, only partial matches at the beginning (or, optionally, at the end) of words will be found, while the former will find all substrings contained in a word. Which of these you want to use depends on your specific use case / site. The procedure is nearly identical in both cases, though:

First, copy a text type definition to schema_extra_types.xml and change the identifier, as described above.
Then, add the following line to the type definition after the first occurrence of "solr.SnowballPorterFilterFactory" (inside of the <analyzer type="index"> element; not after the second occurrence):
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />

If you want partial matches inside of words to be found, too, simply remove the "Edge" part from that line. In this case, you should also remove both occurrences of the solr.WordDelimiterFilterFactory filters: remove everything from the <filter that preceeds that string to the "great than" sign (>) coming after it.

Now, after also adding the field definitions, re-starting your Solr server and re-indexing your content, partial matches should be found with searches on your site.

Adding a new Search API data type

This requires a bit of custom code in addition to configuration changes, but it can make custom additions a lot more user-friendly and easier to use. With Search API Solr Search, you can easily add new data types to the Search API index's "Fields" form. That way, you can make all of the above changes in a way that just lets you select the data type for the field on the "Fields" form like normally. Everything is displayed right in the admin UI, which also makes it easier to remember which fields have custom changes made. Also, you don't need a field's Solr identifier to make changes to its type.

To add the new type, first add it in Solr (as described above) and also add a dynamic field for it. (If it is a non-fulltext type, instead create two dynamic fields: a single-valued one whose prefix ends in s_* and a multi-valued one whose prefix ends in m_*.) Then, create a custom module (or use an existing one) and implement hook_search_api_data_type_info() (documented in the Search API's search_api.api.php file) with the extensions documented with search_api_solr_hook_search_api_data_type_info() (in this module's search_api_solr.api.php file).

For example, if you have created two new types, super_integer for integers and super_text for fulltext, you could use the following for the dynamic fields:

<dynamicField name="superi_s_*"  type="string"  indexed="true"  stored="true" multiValued="false" />
<dynamicField name="superi_m_*"  type="string"  indexed="true"  stored="true" multiValued="true" />

<dynamicField name="supert_*"  type="text"    indexed="true"  stored="true" multiValued="true" termVectors="true" />

Then your hook implementation should look like this (with MODULE being your custom module's name):

function MODULE_search_api_data_type_info() {
  return array(
    // You can use any identifier you want here, but it makes sense to use the
    // field type name from schema.xml.
    'super_text' => array(
      'name' => t('Super fulltext)'),
      'fallback' => 'text',
      // Dynamic field "supert_*".
      'prefix' => 'supert',
      // Fulltext types are always multi-valued.
      'always multiValued' => TRUE,
    ),
    'super_integer' => array(
      'name' => t('Super integer'),
      'fallback' => 'integer',
      // Dynamic fields with name="superi_s_*" and name="superi_m_*".
      'prefix' => 'superi_',
    ),
  );
}

Using the correct Lucene version

Starting with Solr 3.x, it is possible (and mandatory) to specify the version of Lucene your Solr server should use. Since the module developers cannot know what version of Solr their users will running, the default config files contain defaults for Solr 3.5, Solr 4.0 or Solr 5.0 (depending on config version), which will also work for all later versions (of the same major version, i.e., 3, 4 or 5).

However, for best performance, the latest bug fixes, etc., you should definitely use the latest version available to your server, which will be the version of Solr itself. This setting can easily be changed in the solrcore.properties file provided with the config files. Just change the value after the equals sign = that starts with solr.luceneMatchVersion=.
If you are using a version lower than Solr 5, instead of just specifying the Lucene version number, use the following format: first LUCENE_, then the major and minor version number you want to specify, without anything in between. So, for example, if you are using a Solr 4.2 server, the line in solrcore.properties should look as like this:
solr.luceneMatchVersion=LUCENE_42
Never use versions higher than that of your Solr server, as Solr will then refuse to start.

Caution: You should also keep in mind that for some minor version updates, the format of config files can change. This is especially the case for Solr 3.6. This means, that you cannot use versions of 3.6 or later for this setting and still use the default config files provided with this module. That's also why the default setting for the 3.x configs is 3.5 – it is the latest version that will work with the provided 3.x config files.
If you are using Solr 3.6 or higher (but still 3.x), you should either leave the setting unchanged at LUCENE_35; or try to upgrade to Solr 4.x; or, if you are an advanced Solr user, use the correct Solr version and adapt the config files accordingly.

Finding out Solr's identifier for a field

For most of the above instructions, you'll need to find out the identifier Solr uses for a certain indexed field. For example, while the Search API field machine name for the node's "Title" field is title, the Solr will identify that field as tm_title.

If you have access to the Solr server's admin UI, the probably easiest way is to go to the "Schema browser" there and search for a field name that "looks right" (possibly after looking over the detailed instructions below to have a guess how the identifier should look). You can also access detailed information for the field to check if the field has the expected number and range of values (if there are two similarly-named fields).

Otherwise, here are the exact steps to arrive at the Solr identifier of a field:

  1. Find out the field's machine name in the Search API, by looking it up on the index's "Fields" tab. (For Search API versions older than 1.7, you have to look it up in that page's HTML source code instead.)
  2. If you are using a Search API Solr Search module version newer than 1.1 and there is no "Clean field identifiers" form on your server's "Edit" page: replace all colons (:) in the identifier with dollar signs ($).
  3. Prepend the Solr type prefix. This depends on the type selected for the field, and whether the field is multi-valued or not. The first part of the prefix depends on the type, see the following table:
    Fulltext t
    String s
    Integer i
    Decimal f
    Date d
    Duration i
    Boolean b
    URI s
    Latitude/longitude loc
    Location area (WKT) bbox

    Then, add s_ if the field is single-valued or m_ if the field is multi-valued. (In Search API 1.7 and newer, multi-valued fields are marked with a superscript "1" after their name.)
    That gives you the complete prefix, which you have to prepend to the identifier constructed so far to arrive at the complete Solr field identifier.

Another way to find a corresponding field name in the index, is the use of the function SearchApiSolrService::getFieldNames(SearchApiIndex $index, $reset = FALSE). This can be done in the hook hook_search_api_solr_query_alter() like this way: $fields_names = $query->getIndex()->server()->getFieldNames($query->getIndex());.

Comments

angie.perez’s picture

"First, copy a text type definition to schema_extra_types.xml and change the identifier, as described above."
Where do I copy from?

ressa’s picture

First, copy the solr definition files from the search_api_solr module, where example.com is your web site:

cp /var/www/html/example.com/public_html/sites/all/modules/search_api_solr/solr-conf/5.x/* /var/solr/data/drupal/conf/

You will find schema_extra_types.xml in your /var/solr/data/drupal/conf/ folder now, and can adjust it to your liking.

Jack0140’s picture

Hi, I'm a beginner for search api solr. I want to ask, how should I setup for chinese searching in a multi language site?
I added custom field type to the schema_extra_types.xml and I have change the field type to my custom field type, However it partially works. Before I change this settings my site can't even find a chinese word that is in the middle of the sentences but now i can. The question is, chinese word should be able to search like "我在香港", the search result should be show “我”, “在”, “香港”,“我在香港”. However it just show "我在香港" as result. Do any settings will affect it? Cause my other site with different data can work.
Below is the code that i added to the schema_extra_types.xml:

<fieldType name="text_zh" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer">
      </analyzer>
    </fieldType>
Ann Itty’s picture

I am using solr 7.6.0 version .
I have executed techproducts example which came along with Solr installation bundle. In the solr admin UI of techproduct core, i could simply query a text, without specifying the field-name. But for a newly created core, I indexed the data but need to specify the field-name like text:history , where text is the field name, history is the query word I need to search for. What changes should I make in solrconfig.xml or managed-schema.xml(that is schema.xml) , so that I can do free text querying in the new core (that is like simply writing the word "history" in the query space.