As I have explained in the issue queues several times, the Search API currently has a fixed set of (scalar) data types, and there are good reasons for that. Service classes have to be aware beforehand of all data types that might come at them—adding new ones in contrib modules therefore is pretty much out of the question.

Recently, though, I have been more and more convinced that something like this is in fact needed—mostly in context of Geolocation searches (#1117464: Search API integration), but there might of course be other use cases. The best way to do this currently seems to me to require new data type definitions to also include one of the default data types as a fall-back, so service classes not knowing the data type can simply use that standard one (even though the stored information might then be quite useless). We could also—just coming up with this—allow a special value "complex" as the fall-back (not as a normal data type) to specify that service classes not knowing the type shouldn't even bother. Or maybe we could accomplish the same thing by just allowing custom data type definitions to leave out the fall-back.

Hm, thinking about this, we could just require service classes to specify the custom data types they are supporting via special "search_api_type_$type" features in supportsFeature(). That way, the framework code could take care of substituting the fall-back data type, and we wouldn't have to change the API/contract for service classes (making this only an API addition, I guess).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

azinck’s picture

subscribe

Damien Tournoud’s picture

I would like to be able to add geofields to Search API too :) a.k.a. subscribe

mollux’s picture

hello,
I have an implementation for custom data types (and geofield in particular)
I'm still cleaning up the code, I'm almost finished, and hope to put it on line in a couple of days.
You can see a demo on http://spatial.mollux.be, you can login with demo/demo to add your own nodes with an address.

greetings,
Mollux

eticha’s picture

Hello
Your demo was great. Have you put it on line ?
Thanks

mollux’s picture

I just commited the code :)
you can find here: http://drupal.org/sandbox/mollux/1269950

drunken monkey’s picture

OK, a slight problem … The method of explicitly specifying the support for types on the server side and automatically replacing the types accordingly when indexing solves only part of the problem, as I only realized now. After all, modules can also check the fields (and their types) of an index directly, and frequently do (facets, views, DB backend), and with this patch they suddenly can't rely on the fixed set of values that could occur there, anymore. Even though this is an API change anyways, I don't feel quite comfortable making such changes that could heavily influence dependent modules …

So, two solutions for this, that I can think of:
1) Let users only select the field types that the current server actually supports. (Also automatically update this when changing an index's server, of course.) Wouldn't solve the problem for all modules, but at least fpr the backends, which is already something. (Also, this might be a good idea in any case, to avoid user confusion … Or maybe it would be more confusing, can't really tell.)
2) Don't use the "type" key but introduce a new one, in $index->options['fields'][$f]. Can't see anything wrong with that, I guess …

OK, 2) looks much better, but still wanted to give you guys the chance for input on this. Any opinions, or even other ideas?

azinck’s picture

@drunken monkey:
In the case of #1 shouldn't it be the implementing module's responsibility to decide what to do (probably just ignore the field) when encountering an unrecognized type? Even if you move forward with #2 I don't know how a module is going to do anything useful with the data from a field whose type it fundamentally doesn't understand.

I'm not familiar with the spectrum of search backends out there. Assuming you do go ahead with #2, is there the potential for a back-end to support a "sub-type" (for lack of a better word) that fails to map to one of your core types? Returning, for instance, an object or some such?

Damien Tournoud’s picture

So, here is a roadmap:

1. Change search_api_field_types() to return $type => array('label' => 'xxx', 'fallback type' => 'yyyy') instead of just the label, add a drupal_alter(), and add the associated documentation
2. Extends SearchApiServiceInterface to add a supportsType($type) method
3. Add logic in SearchApiIndex::__construct() to add a 'index_type' key to $index->options['fields'], based on the current index capabilities
4. Open issues to all the known indexers so that they use 'index_type' instead of 'type' during indexing

Comments?

drunken monkey’s picture

I'm not familiar with the spectrum of search backends out there. Assuming you do go ahead with #2, is there the potential for a back-end to support a "sub-type" (for lack of a better word) that fails to map to one of your core types? Returning, for instance, an object or some such?

No, I guess we'll have to explicitly forbid this. (Will write that when I update README.txt …) People can, e.g., use the object ID as the property value, and provide a load function.

1. Change search_api_field_types() to return $type => array('label' => 'xxx', 'fallback type' => 'yyyy') instead of just the label, add a drupal_alter(), and add the associated documentation

I have this a bit different, but the basic idea is the same. See attached patch.

By the way, can anyone think of anything else you should/could specify for a custom type?

2. Extends SearchApiServiceInterface to add a supportsType($type) method

I'd just do this with supportsFeature("search_api_data_type_$type").

3. Add logic in SearchApiIndex::__construct() to add a 'index_type' key to $index->options['fields'], based on the current index capabilities

Ah, good idea to add that logic there … I'd have thought about just saving it permanently, but I guess your idea is preferable.

4. Open issues to all the known indexers so that they use 'index_type' instead of 'type' during indexing

During indexing, I'd just use the 'type' key as before, so service backends wouldn't have to adapt. In that case we can tell exactly what types will be known, anyways. (Oh, apart from processors, I guess … But I don't think that would be a problem.)

drunken monkey’s picture

Status: Active » Needs work
FileSize
10.91 KB

Attached is a version which should work. (README.txt is missing, though.) If a custom data type is selected by the user, that type is stored in a special "real_type" entry while "type" contains the fallback type. During indexing I replace "type" with "real_type" in those cases where the real type is known by the service class (as determined by $server->supportsFeature('search_api_data_type_' . $custom_type)). In all other cases, modules will just have to check the "real_field" key if they want to deal with some custom data types.

@ mollux: For your use case I'd now suggest the following (probably in its own search_api_geofield (or similar) module):
- Implement hook_search_api_data_type_info() to define a "geofield" type.
- Make sure the Entity API integration of Geofield returns a single value encoding all information. (Remember that the type has to be "text", or other primitive types defined by the Entity API, though. You cannot use the new custom type there.)
- Properly define in your module the format of the data for this type*.
- Patch the Solr service class / module to handle the new type.

* Now that I think about that, we don't have a way to ensure this type, in contrast to the default data types. The user can just select the type for all fields, after all.
Idea: A third key for hook_search_api_data_type_info() to specify a conversion function. Given some data, it will either return the data in the valid format, or NULL if the data cannot be parsed. I guess the index would then have to do this conversion for all values with custom types, between field extraction and preprocessing.
Otherwise, we could just let service classes take care of this, like we (come to think of it, quite needlessly) do for the default types.

matrixlord’s picture

+

tnightingale’s picture

subscribe

mollux’s picture

I'm working on the integration with the patch, and will commit it when it is implemented. First tests look good :)

phayes’s picture

+

mollux’s picture

I implemented and committed my code.
@drunken monkey, I made a small change to your patch (and attached the modified one), so the real_type is stored in the index, otherwise it has to be implemented again in the server back-end. If the custom_type passes the 'supportsFeature', for the index, it can be stored permanently as the $index->options['fields'][$field]['type'].
I will open a new issue concerning the solr backend in the search_api_solr issue queue.

your last point is a good idea, provide a conversion function and check it in the index. we can even throw an error I think if you use a the type that doesn't have a good value, but I have to think if that is a good idea :)

drunken monkey’s picture

@drunken monkey, I made a small change to your patch (and attached the modified one), so the real_type is stored in the index, otherwise it has to be implemented again in the server back-end. If the custom_type passes the 'supportsFeature', for the index, it can be stored permanently as the $index->options['fields'][$field]['type'].

No, we cannot do this, as data alterations, hook implementations, etc., still might not recognize the type. On the other hand, the backend doesn't have to use supportsFeature() anyways, as it already knows which types it supports. Additionally, not all backends need type information outside of the indexing process.
I agree, it is a bit of an overhead, but I don't think it can be avoided easily. We could only shift it from backends to all other modules, and I don't think that's a good idea. (Or we could introduce a third type key for indexed fields, but that would be really confusing.)

your last point is a good idea, provide a conversion function and check it in the index. we can even throw an error I think if you use a the type that doesn't have a good value, but I have to think if that is a good idea :)

I guess if it can't be converted we should just return NULL as if the field had no value. No need to prevent the whole item from indexing just because of an inappropriate type.
But OK, if you also think this is a good idea, let's add a conversion function.

drunken monkey’s picture

Status: Needs work » Needs review
FileSize
15.75 KB

+ conversion callback
+ explanation in README

As far as I'm concerned, there are no more TODOs here.

drunken monkey’s picture

@ mollux: Can you confirm this works for you? Then I can finally commit this.

drunken monkey’s picture

Re-roll attached.

If no-one objects until Monday, this will be committed. I can't wait forever here.

mollux’s picture

@drunken monkey, I haven't got any time last month to work on spatial search...
I will adapt the code of spatial search tonight to make sure it works with the code you committed.

mollux’s picture

@drunken monkey, it works :)
I have the proof (and code :) ) on my laptop, will commit asap

drunken monkey’s picture

Status: Needs review » Fixed

Excellent, thanks for testing!
Committed.

Automatically closed -- issue fixed for 2 weeks with no activity.