We have node types that have multi-value fields. These may be terms fields or any other type of field. We need to show this data in a tabular form and make it sortable. To do this we need a single node to show up multiple times in the search results for each possible combination of multi-value fields. With plain mysql, this is easy to do, but we haven't identified a way to do this with search_api/search_api_solr. We'd prefer to accomplish this with a reusable/releasable approach, rather than a once off hack.
This is a common problem for things like product displays in ecommerce. I believe that I've seen a discussion in the commerce project about using 'display' nodes to handle this. That's messy and I'd rather avoid it. However, it got me thinking about creating a new index with pseudo nodes for each variant. The pseudo nodes would never exist, but would ultimately map back to real nodes. The trick here is how to keep track of all the possible variants. If the node values change we'd have to prune some of the variants, or just regenerate them all. We can use the search_api hooks to pull from the special index rather than the normal index, and ultimately return results as if thy came from the normal index.
Any ideas on how to accomplish this in a clean fashion?
Comment | File | Size | Author |
---|---|---|---|
#27 | Screenshot 2016-02-08 12.34.38.png | 247.11 KB | regal |
#27 | Screenshot 2016-02-08 12.37.43.png | 260.25 KB | regal |
#18 | interdiff-search_api-1760706-16-18-do-not-test.diff | 1.29 KB | das-peter |
#18 | search_api-1760706-18.patch | 19.06 KB | das-peter |
#16 | search_api-1760706-16.patch | 18.71 KB | das-peter |
Comments
Comment #1
rwohlebAfter talking with another dev, I think I need to clarify this a bit. Here is an example structure of a node (json form):
The resulting search results should include:
You can see how we want the node listed multiple times with each possible combination displayed. We need to be able to sort these variants.
Comment #2
ohthehugemanatee CreditAttribution: ohthehugemanatee commentedWorking on the same problem. :)
Basically the problem is that we want to emulate a relational data manipulation in a non-relational DB.
To explain: all Views does with MySQL to make this result is a LEFT JOIN, because of the way we store field data. Each entry in the field gets its own row in field_my_field, with an eid, entity type, and revision id indexes. This way we can do LEFT JOIN field_my_field ON node.nid = field_my_field.entity_id AND (field_my_field.entity_type = 'node'). This is clever use of a relational database. Solr is not a relational database, though. AFAIK our schema (and Solr's model in general) is flat; it just treats each node as a single object. I was trying to think about how we could more closely map our data structure to Solr to get this done... but the problem is, though Solr does SOME basic relational stuff, it doesn't do relational JOINS at all. I think that doing this the "normal" Views way (ie - in the db layer) is probably not possible as long as the DB is Solr.
Given that we can't do this in the data storage layer, another approach would be to change our schema when this is the case. ie if you want a multi-value field to create multiple objects in display, we'll actually store it as multiple objects in the Solr index. That's the only way I can think of to get our desired result out of a flat data model. In fact, this is the same approach as what commerce does with data displays; we're just doing it across layers.
So just as how in Views we have a checkbox for "display all values in the same row", in our Search Index Fields tab we could have a checkbox for "display all values in the same result". If any of these are unchecked, we create a separate object in the solr index for each combination of single values in the unchecked fields. Then we can let Solr do its thing normally.
Thoughts?
Comment #3
Coyote CreditAttribution: Coyote commentedThis would be most desirable behavior. It's very easy to get multiple rows with a content view, but we have a need for the same functionality with a (faceted) search view.
Comment #4
jgraham CreditAttribution: jgraham commentedWorking denormalized search index
I have a sandbox project over here http://drupal.org/sandbox/jgraham/1777454
This creates an alternate denormalized entity search index. This works great with search_api_solr to push denormalized node entities into the solr index. Everything makes it into the index as expected, eg. for the example in comment 2 we get 9 solr entries from our one node.
Issues
When trying to get the results back out it is not so successful. I can get it to indicate that it found matching documents in the solr search index, but I'm running into the limitation that search_api assumes that the keys returned from
search_api_meta_entity_search_api_item_type_info()
represent entity types. To avoid stomping search_apis standard entity implementation we can't use actual entity type names so the code in my sandbox above uses 'denormalized-ENTITY_NAME' as the item_type keys. This combined with setting the 'type' attribute to the actual entity_type in our SearchApiDataSourceControllerInterface::__construct() gets us pretty far, but the instances of search_index fallback and use our 'denormalized-ENTITY_NAME' as the 'item_type' which in turn gets used in various places under the assumption that it *is* the entity_type.It seems like it would be great to decouple the assumption about what defines a datasource as an entity or not. That is, the index should defer to the datasource about what the entity type is, if any. The index should not make the assumption that the datasource key is the entity type.
Proposed solution
hook_search_api_item_type_info()
SearchApiDataSourceControllerInterface
interface to include a methodgetEntityType()
that returns either '' or the entity_type if defined inhook_search_api_item_type_info()
$index->item_type
or$this->index->item_type
as the entity type to instead call$index->datasource()->getEntityType()
I'm hoping someone more familiar with search_api or one of the search_api maintainers can chime in and indicate the above approache makes sense, and if a patch accomplishing the above would be accepted or if there is any interest in decoupling the assumptions about entity_type. I think that there could be other use cases to create alternate indexes for various entity types that behave in a different manner than the default indexes as provided by search_api. Any other potential approaches that would allow an alternate search index like in my sandbox to leverage the rest of search_api would also be appreciated.
Comment #5
jgraham CreditAttribution: jgraham commentedAttached patch implements the proposed solution in comment 4.
With the attached patch I can get results back via search_api_page, and search_api_views. They both make additional assumptions about what is an entity id and fail loading our proper items. Perhaps our datasource can be improved to create an etid entry that the various search api display modules can use as a loading id rather than making the assumption that the id returned is the entity id.
Also attached is a corresponding patch for search_api_page(), which does the rudimentary steps in comment 4, but this still needs work.
Regardless of whether or not this denormalized solr approach is fruitful it seems like the patch to remove assumptions about entity_type could have generic usefuleness.
Comment #7
jgraham CreditAttribution: jgraham commentedAdjusted patch (without search_api_page patch) this one is now working with the denormalized results displaying in a views search.
There is a section around line 276 in contrib/search_api_views/includes/query.inc that we can hopefully adjust as it is not the ideal performant option, however this is the line that let's us load our denormalized entries rather than the normal full normalized entity. This was tested at commit 0f213681484ad20d0eb4388195f5c8d69b644779 from the sandboxed project linked in comment 4 with a solr search backend.
Screenshot attached to show facets working alongside denormalized results for two distinct nodes resulting in
2016 permutations.Comment #8
das-peter CreditAttribution: das-peter commentedDamn, just discovered this issue here. I was doing something similar here #1783332: Use the $result['id'] instead the array key in SearchApiViewsQuery::addResults()
Sandbox: http://drupal.org/sandbox/daspeter/1783280
Comment #9
das-peter CreditAttribution: das-peter commentedReplaced some other occurrences of
$index->item_type
and addedgetEntityType()
to the interface definition.Let's see if this passes the tests.
Comment #10
das-peter CreditAttribution: das-peter commentedChanging to feature request :)
Comment #11
drunken monkeyYes, I definitely think this extension makes sense! I'm always for adding more genericity, it's a pity I didn't think about this as a restriction right away …
This is much too invasive for such a niche feature. Being able to display results without loading the entities was one of the key requirements for the Views integration, which I don't want to throw away. Especially since that could get in the way of other niche features (data sources which don't implement item loading).
There are many occassions like this, which simply don't work when the type isn't an entity – in this case, e.g., this will always return
TRUE
because passing an empty parameter toentity_get_info()
results in all entity infos being returned, not an empty result.In this example, you could just use
(boolean) $this->index->datasource()->getEntityType()
instead.Please search the patch for other code like this and always think about what happens for an index of non-entities (as well as other edge cases, if possible).
Also, I think we should add a
getEntityType()
method to the index class, which passes the call to the datasource controller. Just a bit shorter to write.I'd also use
NULL
instead of an empty string as the return value for non-entities.Oh, and in the entity datasource controller, you don't have to call the method, just use the property directly!
Please make these changes and I'll look at the patch in more detail.
Comment #12
das-peter CreditAttribution: das-peter commented@drunken monkey Thank you very much for your feedback. I've adjusted the patch accordingly.
Comment #13
das-peter CreditAttribution: das-peter commentedFound a potential issue - actually it struck me in my special setup.
Comment #14
heyyo CreditAttribution: heyyo commentedCould we use this patch with Search API database or just with Solr ?
Could you provide any guideline on howto use it ?
Comment #16
das-peter CreditAttribution: das-peter commentedHere's a re-roll. I hope I found all the changes ;)
Comment #18
das-peter CreditAttribution: das-peter commentedLooks like the method
getEntityType()
in the index class was lost - re-added.Comment #19
drunken monkeyRenaming this and tagging it appropriately. (The API change is not completely backwards-compatible in that the datasource controller interface changed.)
Will test/review later.
Comment #20
drunken monkeyOK, the “later” admittedly turned out to be a lot later – sorry. Anyways, here's a revised version of the patch. It lacked some documentation and also used the new method in several places where the plain item type needs to be used. It also often didn't take into account that the method return value can be empty, which would probably have lead to some weird bugs.
So, could you please test this with your setup, does it work for you?
Comment #21
das-peter CreditAttribution: das-peter commentedI gave it a try, unfortunately I wasn't able to test it with "Search API Denormalized Entity Index" but I think that was caused by my odd existing setup and not the patch ;)
The only things I found were some inconsistencies in the api documentation.
I've adjusted that and here's the adjusted patch.
Comment #22
drunken monkeyAh, thanks for catching that!
However, I think we should at least have someone test this successfully with a custom datasource with entitites before we commit this. Otherwise that part is completely untested.
Comment #23
das-peter CreditAttribution: das-peter commentedDo you know someone with such a custom entity datasource, so I can bother them? ;)
Comment #24
drunken monkeyGood question. jgraham and rwohleb don't seem interested any more. I'd have to search trough the issue queue to find someone. Actually I'd hoped you'd bring it to run with your setup, since you're the one who wants this patch committed.
But in the end, the patch doesn't add a regression in any case, so I guess we can just commit it and wait for someone to complain if they want to use it. Or maybe you'll find some flaw, or succeed in making it work with this patch.
So, committed. Thanks for all your work, everyone!
Comment #25
das-peter CreditAttribution: das-peter commentedThank you very much for committing it - I'll update my version asap and I'll complain when it causes an error :P
Comment #27
regal CreditAttribution: regal as a volunteer commentedWhen I create an indexed search with a repeating date field, I am unable to uncheck the "Display all values in the same row."
I'm not a developer, so I couldn't get everything above, but it seemed like whatever was resolved was committed to the current module.
Can you explain how I can index a repeating date field so I'm able to see the event node appear multiple time in the view?
I'm using the current version of Search API and Views and Solr.
UPDATE: I see that this is a known limitation. I'm trying to see if Date Repeat Entity can help this issue.