Apologies if this has come up before. It's been mentioned that the apachesolr module offers a "more like this" block (and some other features). Since Search API uses a different Solr schema I'd need to have two indexes which seems a bit like overkill. So my feature request is please please please can we have a "more like this" block for Search API?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Sure, sounds like a good idea.

I'm thinking of a custom "mlt" feature that servers could support (Solr would then of course support this). This could probably even be in the core Search API project, if it isn't too large. Maybe just a Views argument handler? Should be possible, probably, but I'd have to look into that.

As there are currently several other features planned, this might take a while, though.

miiimooo’s picture

If I wanted to implement this as a patch or something where would I best start or is it too intricate to do without understanding Search API completely?

drunken monkey’s picture

Well, you'd first have to think about how this could work. My thought at the moment would be, as said, to use a Views argument handler. Then you should probably define a new feature (a simple, unique string that will be handed to SearchApiServiceInterface::supportsFeature()) that adds a method to do a MLT search instead of a normal one. You'd then implement the feature for the Solr backend (you can post both patches here, though) and in the Views argument handler somehow manage to call the mlt search function instead of the normal one. search_api_extract_fields() can be used to extract the data from the entity that comes in via the argument, as that's probably what should be handed to the server.
The only part I'd really have to think about is how to manage to let the query class call a different function. Probably will need to add some functionality there, to add enough flexibility.
I don't think you have to understand the Search API completely, it probably doesn't touch too much of Search API anyways.

miiimooo’s picture

I've partly implemented the argument handler the way you described (I could post a patch but it's trivial) but I can't see a way to properly pass the needed parameters through to the solr service's search functions.
I suppose I could use the search_api_solr_query drupal_alter by creating a new module but I guess it would be cleaner to just have a way of passing the mlt=true and the required mlt parameters from the argument handler to the solr service.

EDIT: I realise there is more to it than I thought. I tried using the same query string that is used by the apachesolr module but that just says "no such method mlt" so I think this had to do with the schema. I've tested a couple of different variations as I was thinking you could just pass the ID to solr and it would figure out what to do to find similar content. I've been trying these

q=id:1&mlt=true&mlt.fl=*&mlt.mindf=1&mlt.mintf=1&fl=id,score
q=1&mlt=true&mlt.fl=*&mlt.mindf=1&mlt.mintf=1&fl=id,score
q=resources_index-1&mlt=true&mlt.fl=*&mlt.mindf=1&mlt.mintf=1&fl=id,score
q=id:resources_index-1&mlt=true&mlt.fl=*&mlt.mindf=1&mlt.mintf=1&fl=id,score

But it always returns numFound=0

Any ideas what I should be doing?

drunken monkey’s picture

Title: more like this » Add a "More like this" feature

Hm, not sure about this. My first thought was that an extra function on the service class would probably be best. Then you'd need some added flexibility on the Views query class that would allow you to call this other function instead of executing the query normally. E.g., instead of calling $this->query->execute() the query plugin class could call registered handlers and wait if one of those instead executes a search. The argument handler would then just have to call something like $this->query->registerSearchHandler(array($this, 'handleSearch')) and do the MLT search request in the handleSearch() method. This would certainly be possible and not even that hard I guess.
Without modifying the Views query plugin, this would be harder I guess. One thing I could think of would really be just to let the MLT feature be an additional option on the search query class (not the Views plugin, but SearchApiQuery). Then, the argument handler would simply have to call $this->query->addOption('search_api_mlt', $data) with the necessary data. I'm just not really sure whether it's a good idea to let this functionality live directly in the search() method of the service class, and not use a different one, as this does modify the results after all. However, when it should be possible to also add normal fitlers and fulltext searches to an MLT request (which makes sense – but I don't know, if Solr even supports this), it would probably make sense again.

I don't know, am I making any sense here to you? If I do, what is your opinion, what are your arguments? This is an architectural decision after all on how this feature should be handled, which should better be thought through thoroughly. (OK, I added the adverb only because the combination looks funny.)
Give me one or two days to decide this, then I'll also know how it should be implemented.

In any case, you shouldn't alter the Solr query directly – this should be implemented generically, not Solr-specific. Pass the basic field values somehow to the service class and then let that figure out, how to get appropriate results back.

By the way, I just added this to GSoC proposal. So, provided I'm accepted, I'd implement this myself this summer (July, probably). If you implement this before, it would also be great of course, and I'd just tie up possible loose ends, or add documentation / tests or such. ;)

Oh, and as for your edited problem: Maybe the mlt handler is disabled in the solrconfig.xml? Gotta go now, no time to look myself, but that would be my guess.

miiimooo’s picture

I thought it would be "as simple as" adding something like a query type option to the call to SearchApiQueryInterface. I don't completely understand your more generic points also because I'm looking for a solution for a very narrow case. My piece of code so far theoretically allows to specify which fields to use for MLT and would subsequently add these as mlt.qf's to the query. Probably this is not good.

I'm still stuck with the solr side of things though. I've enabled the additional MLT handler in the solrconfig.xml but passing the item_id to get it to return similar content still returns nothing. I don't know enough about solr.

For a head start here are the diffs for the views part:

diff --git a/contrib/search_api_views/includes/handler_argument_more_like_this.inc b/contrib/search_api_views/includes/handler_argument_more_like_this.inc
new file mode 100644
index 0000000..a8e79e7
--- /dev/null
+++ b/contrib/search_api_views/includes/handler_argument_more_like_this.inc
@@ -0,0 +1,89 @@
+<?php
+
+/**
+ * Views argument handler class for handling fulltext fields.
+ */
+class SearchApiViewsHandlerArgumentMoreLikeThis extends SearchApiViewsHandlerArgument {
+
+  /**
+   * Get the title this argument will assign the view, given the argument.
+   *
+   * This usually needs to be overridden to provide a proper title.
+   */
+  public function title() {
+    dsm($this);
+    return t('Search for "@arg"', array('@field' => $this->definition['title'], '@arg' => $this->argument));
+  }
+  /**
+   * Specify the options this filter uses.
+   */
+  public function option_definition() {
+    $options = parent::option_definition();
+    $options['fields'] = array('default' => array());
+    return $options;
+  }
+
+  /**
+   * Extend the options form a bit.
+   */
+  public function options_form(array &$form, array &$form_state) {
+    parent::options_form($form, $form_state);
+
+    $index = search_api_index_load(substr($this->table, 17));
+// while the argument gets added query isn't set
+//     $index = search_api_index_load($this->query->getIndex());
+    if (!empty($index->options['fields'])) {
+      $fields = array();
+      foreach ($index->options['fields'] as $key => $field) {
+        if ((search_api_is_text_type($field['type'])
+          || search_api_is_list_type($field['type'])) && ($field['indexed'])) {
+          $fields[] = $key;
+        }
+      }
+    }
+    if (!empty($fields)) {
+      $form['fields'] = array(
+        '#type' => 'select',
+        '#title' => t('Fields for Similarity (not implemented)'),
+        '#description' => t('Select the fields that will be used for finding similar content. If no fields are selected, all available fields will be searched.'),
+        '#options' => $fields,
+//         '#size' => min(4, count($fields)),
+        '#multiple' => TRUE,
+        '#default_value' => $this->options['fields'],
+      );
+    }
+    else {
+      $form['fields'] = array(
+        '#type' => 'value',
+        '#value' => array(),
+      );
+    }
+  }
+
+  /**
+   * Set up the query for this argument.
+   *
+   * The argument sent may be found at $this->argument.
+   */
+  public function query() {
+    ///TODO this is not proper
+    $server = search_api_server_load($this->query->getIndex()->server);
+    if (!$server->supportsFeature("search_api_more_like_this")) {
+//       dsm("doesn't support mlt");
+//       $this->query->keys("bla");
+      return;
+    }
+    $nid = $this->argument;
+    $node = node_load($nid);
+//     $this->query->fields(array("t_title"));
+    $this->query->keys($node->title);
+//     $this->query->keys($node->title);
+//     $this->query->keys("qt=mlt");
+//     $this->query->keys("similar to $nid");
+//     $this->query->condition("mlt", "true");
+    dsm((array)$this->query->getSearchApiQuery());
+    dsm($this->query->getIndex());
+    return;
+  }
+
+}
diff --git a/contrib/search_api_views/search_api_views.info b/contrib/search_api_views/search_api_views.info
index c77da3b..2e4e0aa 100644
--- a/contrib/search_api_views/search_api_views.info
+++ b/contrib/search_api_views/search_api_views.info
@@ -10,12 +10,12 @@ package = Search
 files[] = includes/display_facet_block.inc
 files[] = includes/handler_argument.inc
 files[] = includes/handler_argument_fulltext.inc
+files[] = includes/handler_argument_more_like_this.inc
 files[] = includes/handler_argument_text.inc
 files[] = includes/handler_field.inc
 files[] = includes/handler_field_boolean.inc
 files[] = includes/handler_field_date.inc
 files[] = includes/handler_field_duration.inc
 files[] = includes/handler_field_options.inc
 files[] = includes/handler_filter.inc
 files[] = includes/handler_filter_boolean.inc
diff --git a/contrib/search_api_views/search_api_views.views.inc b/contrib/search_api_views/search_api_views.views.inc
index 1a5e9c1..69c795a 100644
--- a/contrib/search_api_views/search_api_views.views.inc
+++ b/contrib/search_api_views/search_api_views.views.inc
@@ -167,6 +164,14 @@ function search_api_views_views_data() {
     $table['search_api_views_fulltext']['type'] = 'text';
     $table['search_api_views_fulltext']['filter']['handler'] = 'SearchApiViewsHandlerFilterFulltext';
     $table['search_api_views_fulltext']['argument']['handler'] = 'SearchApiViewsHandlerArgumentFulltext';
+
+    $table['search_api_views_more_like_this']['group'] = t('Search');
+    $table['search_api_views_more_like_this']['title'] = t('More Like This');
+    $table['search_api_views_more_like_this']['help'] = t('Find similar content.');
+    $table['search_api_views_more_like_this']['type'] = 'text';
+//     $table['search_api_views_more_like_this']['filter']['handler'] = 'SearchApiViewsHandlerFilterFulltext';
+    $table['search_api_views_more_like_this']['argument']['handler'] = 'SearchApiViewsHandlerArgumentMoreLikeThis';
+

I was about to upload a proper patch but realised the master had been changed so these may or may not work.

drunken monkey’s picture

For the Solr part I can't really help you. I haven't done this either, yet, so also couldn't really do anything but experiment around (and reading Solr documentation) until it works. Just tried it out quickly, but I also didn't get any results back.

For the argument handler, as said: don't try to set the Solr-specific options directly, but just set the generic information (entitiy ID, fields to include) on the query as options (with setOption(); condition() is for completely other purposes and will most likely scream if just being handed "mlt") and then implement the translation from that information to the Solr query parameters inside the Solr service class' search() method.

I'm now rather sure that letting MLT queries work with the normal search() method and query options is the better way to do this, so that part will at least be simpler.

pcambra’s picture

suscribe

7wonders’s picture

subscribe

miiimooo’s picture

For the Solr part I can't really help you

that's a pity. Maybe someone else here understands the solr MLT feature. I think I've tried all possible combinations but keep getting either exactly one result which is the document itself or zero results. I've also tried adding the MLT handler to the solrconfig.xml but it's the same. Maybe a case for the solr-users mailing list..

My queries look like this:
select?qf=t_title&q=id:resources_index-1&mlt=true&mlt.fl=t_title,id&mlt.qf=t_title
and responses look like this:

<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
−
<lst name="params">
<str name="qf">t_title</str>
<str name="mlt">true</str>
<str name="mlt.fl">t_title,id</str>
<str name="q">id:resources_index-1</str>
<str name="mlt.qf">t_title</str>
</lst>
</lst>
<result name="response" numFound="0" start="0" maxScore="0.0"/>
<lst name="moreLikeThis"/>
</response>
drunken monkey’s picture

Just an idea: Maybe the used fields need to be set to "stored" for this to work? Try changing it in the schema.xml (restart Solr, re-index, etc.) and see if this changes anything.
Sorry that I didn't come up with this earlier (if it works).

miiimooo’s picture

Thanks for this. I changed this:

<dynamicField name="t_*" type="text" stored="true" indexed="true" />

then restarted, jetty, dropped the index and re-index but still same result..

miiimooo’s picture

Well as you may have guessed I'm not an expert on solr. I've figured out that using the apachesolr module's schema/solrconfig I can use a query like this:

select?qt=mlt&mlt.fl=title&q=id:jlxay2/node/4910&fl=id

and get some results that make sense.

Now trying the same with the search api solr schema I built this:

select?qt=mlt&mlt.fl=t_title&q=id:resources_index-1&fl=id

and guess what that also works!

I have to test whether this is because of the stored attribute. Thanks in any case I think now I can start implementing this.

EDIT: have tested with the ss_* dynamic field and adding the stored=true made the difference. Thanks drunken monkey! I could include a patch for the xml files here. I suppose the t_*, ss_*, f_ss_* and f_sm_* would make sense. Any others you could think of?

drunken monkey’s picture

I think, there are use cases for nearly all field types (although dates are maybe a little far-fetched), so probably they would all need to be stored. Which, I have to add, doesn't really sound good to me, as that would use quite a lot of extra space for just that one feature.
However, disk space isn't really expensive, people wanting to avoid this can still set the fields to stored="false" manually, and doing something like providing two different schema.xml files (one for using MLT, and one for using Solr without it) would probably be too messy.
So, we can probably just set all fields to stored, and document in the README.txt when and how this can be changed.

Anyways, glad that this finally worked for you! Always a pain if something just doesn't work and you can't explain why …

miiimooo’s picture

Version: 7.x-1.0-beta7 » 7.x-1.x-dev
FileSize
4.95 KB

I'm attaching a first patch for the views argument handler (it should apply neatly against the 7.x-1.x version). Note that all it does is use the setOption method to add the MLT data to the query. I want to check with you whether this looks okay and then go on to first implementing the solr server side of things and finally make the views option actually work. Your feedback would be very helpful. Thanks.

miiimooo’s picture

Attached is the second patch that implements the MoreLikeThis feature for the SOLR search service. This is a rudimentary implementation but works. Comments welcome.

mh86’s picture

As far as I know, it's better to use termVectors="true" than stored="true" (for more info see Solr MLT documentation http://wiki.apache.org/solr/MoreLikeThis)

miiimooo’s picture

Status: Active » Needs review
FileSize
5.12 KB

Thanks for the comment. Attached a patch to change the schema.xml (accordingly). The MLT features works with this as well.

drunken monkey’s picture

Thanks for the help, Matthias! It really does work with term vectors instead of stored fields, good to know!

Regarding your patches, Michael: They do apply cleanly, and the functionality works very well. Great work!
I just couldn't figure out how (or, whether) the whole argument passing works – but that is probably a Views issue, and/or an issue with my argument handler (which you are just extending, after all). Does getting an argument for a block view out of the path even work, normally?

Anyways, with the code there are several other issues:
- First off, the way you currently pass the MLT options a) doesn't use correct namespacing and b) is unnecessarily Solr-specific. As said, you should not use the Solr field names directly, but pass the Search API field names and let the Solr service class figure out the Solr field names. The Views handler has to be written in a way that it could be as well used by, e.g., the database backend. Also, you should use a single option with the name of the feature, which would then be an array with the necessary settings. Using several options that don't have the feature name in them and have Solr-specific instead of descriptive names isn't good style for generic features.
- Why only use fulltext or list fields? Just check if the field is indexed, the users should be able to decide themselves what fields they want to use. Or is there a real reason generally not to use those fields?
- Code style: In the Solr backend you have to check if the "qt" option is even present, before checking it.
- You shouldn't name the variable $nid in query(), as this isn't really node-specific.
- Regarding the @todo in query(): Checking it there seems OK. You should then probably throw an exception if the feature is not supported, so the view doesn't get displayed. You should also include a note to that effect into the argument description.

Smaller issues with the second patch:

+<!-- $Id: schema.xml,v 1.1.2.1 2010/11/26 21:27:11 drunkenmonkey Exp $ -->

Seems like an old version.

+++ b/solrconfig.xml
@@ -68,9 +68,8 @@
        other than the default ./data under the Solr home.
-       If replication is in use, this should match the replication configuration.
-       Might be problematic for some installs, so we just keep the default. -->
-  <!-- <dataDir>${solr.data.dir:./solr/data}</dataDir> -->
+       If replication is in use, this should match the replication configuration. -->
+  <!--dataDir>${solr.data.dir:./solr/data}</dataDir-->

Don't know why this is included, but I guess it's just a mistake? (Or, maybe, a bad XML correction function in the editor?)

Attached is a patch for the Solr module, using term vectors and fixing those two minor issues.

Powered by Dreditor.

drunken monkey’s picture

Status: Needs review » Needs work
miiimooo’s picture

Thanks for all your comments and the careful review Thomas (and Matthias - should we have a little Austrian flag here?). I'm just about to rewrite the options setting and handling. Just re the argument handling and passing in the views part I have a bit of a principal problem of understanding what to do about this. What I'm currently "handling" is the fields that are being used to look for similarities (the mlt.fl in SOLR lingo). I don't think there is much of a problem here. What seems worth discussing is the actual query that goes to the backend. Originally, and in the patch I only thought of matching on the id - in a way saying "find documents similar to the one with this item id using this list of fields". Do I understand you correctly that you think there should be an ability to match on more than just the item id - in other words the argument passed to the search could be anything and hence a needs an option with the fields to match on?

drunken monkey’s picture

Do I understand you correctly that you think there should be an ability to match on more than just the item id - in other words the argument passed to the search could be anything and hence a needs an option with the fields to match on?

No, I don't think so. Passing the ID should be fine – if a backend needs additional data, it can extract it itself anyways.

miiimooo’s picture

Here two new patches against 7.x-1.x.
* fixed passing of MLT values so they use the 'search_api_mlt' option
* implemented the functionality to select the fields on which to look for similarities on
* field names in the views argument handler are Search API field names
* added a check to the query() method for whether the server supports MLT
* any indexed field can be used for finding similarities

To test create a Seach API index based view (using a SOLR server), add an argument "Search: More Like This", optionally set the verification to Node ID and select some fields to compare on, run the view with a node id (e.g. 1)

drunken monkey’s picture

Yes, seems to work great!
Two comments, though:
Firstly, I always get the following error in a Javascript popup when saving the argument's option form:

An AJAX HTTP error occurred.
HTTP Result Code: 200
Debugging information follows.
Path: …
StatusText: OK
ResponseText: 
Fatal error:  Call to a member function get_option() on a non-object in …/views/plugins/views_plugin_style.inc on line 35

This doesn't happen for my own argument handlers, as far as I can see.
Secondly, does this already work with blocks? This functionality doesn't really make sense as a page in most cases, creating a block that uses the node currently displayed will usually be what the user wants.

And also a few comments regarding the code, even though it already looks much better:

+++ b/service.inc
@@ -430,10 +431,26 @@ class SearchApiSolrService extends SearchApiAbstractService {
+      $params['fl'] = 'item_id,score';
+      //disable facets
+      unset($facet_params['facet.field']);
+//       $keys = 'id:' . $index->machine_name . '-' . $options['id'];
+      $keys = 'item_id:' . $mlt['query'];

- You don't need to set the "fl" params, this is already the default.
- Even though it's rather improbable that someone would want to display facets for the MLT query, it could be done – and since the user can always disable facets for individual searches anyway (and will have to do so in most complex scenarios), we should just leave the possibility here.
- I think you should use the "id" field again, instead of "item_id", as that is the real identifier. The other one could lead to problems when several indexes lie on the Solr server. Or is there some specific reason why you changed this?

Powered by Dreditor.

miiimooo’s picture

I can't reproduce the AJAX error you've mentioned. I'm attaching a screen shot with the settings I have. Maybe tell me yours..

Re secondly: I'm using it in a block with the settings shown in the screenshot (take node id from URL).

Re "fl" parameter: if I comment it out I get "Notice: Undefined index: score in SearchApiViewsQuery->addResults() (line 186 of .. search_api/contrib/search_api_views/includes/query.inc)."

Re disabling the facets: I'd suggest to have a switch or something. As you say it normally doesn't make sense. I could add this as an option to the argument handler. My thinking here is also that MLT could be used quite a lot on sites - it's handy to have a MLT block on node pages - and the facets surely introduce some overhead.

Re the "id" field: there is a "fq" limiting to the index in any case. No other reason.

Shadlington’s picture

Psssst, you forgot to attach the screenshot! ;)

drunken monkey’s picture

I can't reproduce the AJAX error you've mentioned. I'm attaching a screen shot with the settings I have. Maybe tell me yours..

Re secondly: I'm using it in a block with the settings shown in the screenshot (take node id from URL).

You forgot to attach the screenshot. (Can't complain, though, as this happens to me all the time. ;))
But good to know that it works with blocks!
Are you using the latest dev version of Views? Maybe that's what's causing the bug for me.

Re "fl" parameter: if I comment it out I get "Notice: Undefined index: score in SearchApiViewsQuery->addResults() (line 186 of .. search_api/contrib/search_api_views/includes/query.inc)."

Ah, it seems the defaults configured in solrconfig.xml are only for one request handler. Since MLT uses a different request handler, it has its own defaults. Please add

<str name="fl">item_id,score </str>

To the defaults of the mlt handler, then you shouldn't need to manually set the fl parameter with each request.

Re disabling the facets: I'd suggest to have a switch or something. As you say it normally doesn't make sense. I could add this as an option to the argument handler. My thinking here is also that MLT could be used quite a lot on sites - it's handy to have a MLT block on node pages - and the facets surely introduce some overhead.

If someone doesn't want facet blocks for MLT queries, they'll deactivate them. And then, there also won't be any overhead (or at least none that you could prevent in the Solr service class).

Re the "id" field: there is a "fq" limiting to the index in any case. No other reason.

I'm not sure if this is used here. I think it's possible that the fq will only limit the results returned, not the initial selection of the document to match – but I don't really know. In any case, I think it's better to be on the safe side here, and use id directly.

miiimooo’s picture

FileSize
81.47 KB

Screen shot attached!

Are you using the latest dev version of Views? Maybe that's what's causing the bug for me.

Yes I'm on 7.x-.3-dev. I tried downgrading to 7.x-3.0-beta3 but that just breaks views. Any ideas for a fix? The mentioned code looks pretty harmless to me:

    // Overlay incoming options on top of defaults
    $this->unpack_options($this->options, isset($options) ? $options : $display->handler->get_option('style_options'));
If someone doesn't want facet blocks for MLT queries, they'll deactivate them[..]

The only place to disable facets would be the section in the index AFAIK. So they would need a separate index just for MLT?

Shadlington’s picture

You can disable the facet for specific searches in the settings page of the facet block

miiimooo’s picture

I'm using a normal Search API index based view but it still sends the facets' parameters in the request to the SOLR server.

drunken monkey’s picture

Screen shot attached!

This doesn't look like the newest Views version, but something from before the UI revamp. Now, the term "argument" isn't used anymore and "Node" is called "Content". Maybe you'll get the same error as I with the latest Views version.
However, when making a dirty fix in Views for this issue, everything works fine – even with a block, now.
The issue seems to be that, for some strange reason, $display->handler isn't set in the code you posted, only for your argument handler. As said, I strongly suspect a Views bug and no fault on your side here.

The only place to disable facets would be the section in the index AFAIK. So they would need a separate index just for MLT?

As Shadlington correctly said, you can hand-pick the searches for which a facet block should be displayed on the block's configuration page. Assuming you have more than one search which uses the index in question, that is.
However, at the moment the MLT search will have the same search ID as other queries of that view – so if you are using the same view for MLT and regular searches, this is really a problem.
Just thought about it and this should probably be tackled: attached is a one-line patch that adds the used display to the search ID for views. Please include this in the general MLT patch.

miiimooo’s picture

I've added your patch but I still don't really see anywhere to disable facets.

You can disable the facet for specific searches in the settings page of the facet block

As Shadlington correctly said, you can hand-pick the searches for which a facet block should be displayed on the block's configuration page. Assuming you have more than one search which uses the index in question, that is.

I display the MLT block on a node page. No facets block is enabled or showing. Looking at the request sent to jetty it has the facets in there and I don't have anywhere to disable them:

INFO: [www.****.org] webapp=null path=/select params={f.f_sm_field_region:name.facet.limit=-1&facet.missing=false&facet=true&sort=score+desc&facet.mincount=1&facet.limit=10&qf=t_title&qf=t_body:value&json.nl=map&wt=json&version=1.2&rows=10&f.f_sm_field_theme:name.facet.limit=-1&facet.sort=count&start=0&facet.field=f_sm_field_region:name&facet.field=f_sm_field_theme:name&facet.field=f_sm_field_type:name&f.f_sm_field_type:name.facet.limit=-1&fq=ss_type:"whoswho"&fq=bs_status:"true"&fq=index_id:resources_index} hits=2 status=0 QTime=2 

Maybe we're talking at cross purposes

drunken monkey’s picture

Then what do the facet block's settings look like to you?
And: Does the facet block appear anywhere else?
If the index was used for more than one search since the facets were defined, the "Display for searches" and "Search IDs" options should appear on the facet block's config page.

miiimooo’s picture

I actually have a list of facet blocks. But none of them appear on the same page(s) as the MLT block. Does that make sense?

Think about it: in most cases you will display an MLT block together with a node. Usually, you display facet blocks together with listings.

drunken monkey’s picture

Of course that makes sense. And I'm still sure that on the facet block's config pages will be options to exclude the MLT search from facetting. Whether they are actually displayed on the page (due to block settings) doesn't matter.

miiimooo’s picture

Okay I'll leave it there. Attached please find the two patches.

miiimooo’s picture

Forgot to rebase...

Here are the correct patches

drunken monkey’s picture

Status: Needs work » Needs review

OK, looks good now. I think this is almost ready to go in. Can someone else confirm this is working without problems, too?
If no further problems turn up, there's just some code-cleanup and a bit of documentation (mention and explain the feature in the two READMEs) to do.

And then I should some time get around to fixing the whole argument handler options. Even though they seem to be equally broken in Views … At least there should be the option to get just a specified part of the path as the argument, without having to specify whether it's a nid, uid, tid or whatever.

Shadlington’s picture

If you give me a quick step-by-step of how to use this I'll test it out, otherwise I'll probably get around to it at the weekend when I have time for trial and error... Unless its pretty intuitive? I only skimmed over the issue.

miiimooo’s picture

* Create a block view using a Search API index that uses a SOLR server
* Add an argument of type "Search: More Like This"
* Set it to "Provide default argument " -> "Content ID from URL"
* Optionally set Validation to Content and Filter value format to Node ID
* Set the block to display together with a node

drunken monkey’s picture

Apply the first patch to the Search API and the second one to the Solr backend module. Restart the Solr server and re-index the indexes on it. Then create a new view for a node index lying on the Solr server, add the "Search: More Like This" argument handler (or "contextual filter") and a block display. Finally, enable the block and access a node page.
You should then get a list of related nodes in the Views block.

PS: Damn, slower and less detailled than above! ;)

Shadlington’s picture

Erk. Solr error - Severe errors in my solr configuration!
I'm not sure what I did wrong... Need to pop out for a bit but will be back soon to have another go.

Shadlington’s picture

Oh I figured it out.
The second patch uncomments "${solr.data.dir:./solr/data}".
That's making solr look for the data directory in /var/lib/tomcat6/solr/data/ but that's not where mine is.

Also I noticed a small typo in a comment in the first patch: 'thsi' should be 'this' in the 4th line of handler_argument_more_like_this.inc.

...Will continue testing in a minute, gotta make dinner.

Shadlington’s picture

MLT argument handler is apparently broken: "The handler for this item is broken or missing and cannot be used. If a module provided the handler and was disabled, re-enabling the module may restore it. Otherwise, you should probably delete this item."

EDIT: More detail would be helpful I suppose, but I'm not sure there's much to give.
Used the wizard to quickly create a block view for my solr index. Immediately went to add the argument without changing anything and all I see is the message about the handler being broken. Hmm.

drunken monkey’s picture

Oh I figured it out.
The second patch uncomments "${solr.data.dir:./solr/data}".
That's making solr look for the data directory in /var/lib/tomcat6/solr/data/ but that's not where mine is.

Oh, that's still in there? I criticized that already back in #20. miiimooo, please correct this.

MLT argument handler is apparently broken: "The handler for this item is broken or missing and cannot be used. If a module provided the handler and was disabled, re-enabling the module may restore it. Otherwise, you should probably delete this item."

Have you cleared all caches? Seems like Drupal didn't pick up the changed .info file …
Otherwise, I'm at a loss.

Shadlington’s picture

*facepalm*

It was the caches. So silly of me... I knew I was forgetting something.

Anyway, for the record I get the same ajax error you were getting. Was on the latest dev but went back to the beta and still got it.

I am struggling to get the block to display though. Under what conditions should it display?
I didn't put any validation on the argument and didn't put any restrictions on the block. Stuck the block in a sidebar. No joy.

Shadlington’s picture

Added empty text to the view and the block shows, so the problem is its not finding anything.

I set the argument to match on a taxonomy field and made sure I had a bunch of nodes tagged with terms from the taxonomy, with plenty sharing the same terms. Have I misunderstood how MLT matches similar content?

drunken monkey’s picture

Anyway, for the record I get the same ajax error you were getting. Was on the latest dev but went back to the beta and still got it.

Apply the attached patch to Views and it should stop doing that.

I am struggling to get the block to display though. Under what conditions should it display?
I didn't put any validation on the argument and didn't put any restrictions on the block. Stuck the block in a sidebar. No joy.

Had problems, too. Views arguments configuration is a science in itself. Try it with the settings miiimooo posted in #29 and #41.
Also, try if you get any results in the preview on the Views edit page, when you enter an argument, to pinpoint where this fails.
Also: Did you restart and re-index the server? Otherwise, MLT queries won't match anything, as they have no data to work on.

Shadlington’s picture

Thanks for that patch, it does the trick.
Yeah I did try the preview without any luck too :(

...Oh hey. I just tried something. I didn't select any fields to be used for matching on (so all available are used) and it found them.

The thing is, I can't tell how its finding related content. It seems fairly random. I can't see why its not just finding every other node, rather than the sub-set it has found.

drunken monkey’s picture

FileSize
19.31 KB

Hm …
Come to think about it, storing term vectors for not-fulltext fields doesn't make much sense. They aren't tokenized, so will always have just the one term (unless they are multivalued …). Well, at least I'd think so, but I'm no expert by far.
Please test whether this works better with the attached schema.xml.
Like it is, it might be the case that Solr only uses the fulltext fields to compute similarity. However, users on production sites will probably want to customize this anyways, to maybe boost some fields or such things.

Shadlington’s picture

Ah. I didn't understand that this was only comparing fulltext fields, sorry about that.

I'm not sure what difference that schema made.
Since I started to get results I'm really not sure how to measure how well this is working!

I would say one thing I noticed as I clicked through the suggested content (I changed the view to output the title linked to the node) was that however solr is deciding content is related doesn't seem to be two way.
i.e. The following scenario is possible:
You go to node X and MLT suggests node Y. You go to node Y and MLT does not suggest node X.

That seems odd to me, but then I don't know what MLT is doing. Until just now I figured it'd compare the taxonomy terms but I was completely wrong about that (well... I guess it'd compare them if I created an aggregated field of them for MLT to use).

Shadlington’s picture

Anyway I guess my problem here is I don't know how to measure 'success'.
The MLT view is returning results. This is cool! But I don't know how to be sure its doing it correctly. Any pointers?

drunken monkey’s picture

Ah. I didn't understand that this was only comparing fulltext fields, sorry about that.
[…]
Until just now I figured it'd compare the taxonomy terms but I was completely wrong about that

Sorry, didn't want to sound too confident: that was just my own guess, it may well be nonsense. Also, I didn't mean to say it compared only (and even less that it should only compare) fulltext terms, but that it maybe does so with the current settings (using term vectors for all fields). After all, taxonomy terms might really give the best indication regarding the "relatedness" of two items.

As to measuring success, I'm also not sure. I prviously did have the feeling that it worked, and now partly confirmed this with two test nodes with similar titles, body and content – but there, I could also reproduce your problem of missing symmetry, which is really weird.
The request seems to be alright, though, this apparently is some problem with Solr, or with the configuration. Don't really know what we can do there, with none of us being a Solr expert.

davidseth’s picture

Status: Needs review » Reviewed & tested by the community

I have tested the MLT stuff. Works!! I love you guys!

The results are a bit different though, not quite what I would expect. I am happy to dig into this. I have an Apache Solr install with the exact same criteria running and can compare the two. I get a few similarities, but a lot of differences. I will dig into this.

But the big thing, works really well! Yay!

Cheers,

David

Shadlington’s picture

I'm not sure we can call this RTBC when we're still openly questioning its validity - but certainly thank you for joining in.
A comparison with how the Apache Solr module is doing it is helpful - those guys *are* Solr experts.

...Maybe we should ask one of them for input?

drunken monkey’s picture

Status: Reviewed & tested by the community » Needs review

Agreed, we sure need to first confirm that we aren't making any mistakes here, before committing this.
Also, this still needs code cleanup and documentation.

David, if you also have an apachesolr install, could you check the Solr log and copy-paste an MLT request in here? And also an MLT request sent by the Search API (or just the fields you are using for MLT).
The relevant portion of solrconfig.xml is identical, so that can't be the reason for the differences. If they also use about the same request, the only remaining difference could be the schema.xml. They store all fields and additionally add term vectors for some of them – but I don't think that would make the difference.

davidseth’s picture

Okay, I have identical configurations for these two blocks. The first one is coming from search_api with MLT patches. The second one is coming from Apache Solr module.

I have them both set to deliver node types of Article and both set to use 3 vocabularies to filter on (tm_vid_1_names, tm_vid_3_names, tm_vid_6_names).

The Search API query is *much* bigger and has things like faceting in it. This should not be there. The big issues is that mlt.fl does not appear in the Search API query. This basically is not restricting the MLT query on anything which will essentially make the MLT engine do nothing but spit back results. We need to pass at least something into it to get some relevancy.

Search API (with MLT patches)

webapp=/solr path=/select params={facet.missing=false&facet=true&facet.mincount=1&facet.limit=10&qf=t_search_api_viewed&qf=t_title&qf=t_body:value&json.nl=map&wt=json&rows=5&version=1.2&start=0&facet.sort=count&facet.field=im_field_project&facet.field=ds_field_publication_date&facet.field=im_field_ref_content&facet.field=im_field_ref_group&facet.field=im_field_tags&facet.field=im_field_voc_author&facet.field=is_field_voc_event_type&facet.field=im_field_voc_group_type&facet.field=im_field_voc_location&facet.field=im_field_voc_nailsma_type&facet.field=is_field_voc_newsletter_name&facet.field=im_field_voc_person_type&facet.field=is_field_voc_publication_type&facet.field=im_field_voc_region&facet.field=f_ss_type&fq=ss_type:"article"&fq=index_id:node} hits=27 status=0 QTime=3 

Apache Solr

 webapp=/solr path=/select params={mlt.minwl=3&mlt.fl=tm_vid_1_names,tm_vid_3_names,tm_vid_6_names&mlt.mintf=1&mlt.maxwl=15&mlt.maxqt=20&json.nl=map&wt=json&rows=4&mlt.mindf=1&fl=nid,label,path,url,teaser,ss_image_url,ss_file_url,type,tm_node&start=0&q=id:f0tasb/node/37&qt=mlt&fq=(bundle:article)+} status=0 QTime=36
Shadlington’s picture

Yeah "do nothing but spit back results" describes what I'm seeing. As I said before, I didn't see much of a pattern of 'relatedness'.

The facets thing might be due to facet blocks not being correctly configured, as was discussed earlier (though I really don't know, maybe it is just sticking the facets in anyway).

drunken monkey’s picture

For me, the mlt.fl parameter is correctly passed. Did you set any fields in the MLT argument handler? (Maybe, when none are set, this should fall back to a reasonable default? Probably, all fields should be passed, then.)
If I manually remove the mlt.fl parameter from the query, I get just a Solr error, so I'm surprised this even produces any results for you.

And as Shadlington said, you'll have to deactivate the facets for the MLT queries for them not to be sent.

davidseth’s picture

How do I deactivate the facets for the MLT View block?

drunken monkey’s picture

In the corresponding facet block settings, deactivate them for the search ID that corresponds to the MLT view. E.g., when the MLT view has the machine name "mlt", the search ID will probably be "search_api_views:mlt:block".
This should stop the Facets module from attaching the facet parameters.

(If you don't update the module to fix #1129226: Incorrect handling of facets deactivated for some search IDs, you even have the advantage of needing to set this for only a single facet block! ;))

davidseth’s picture

Great. Thanks for the quick feedback. I wasn't not aware of what those settings were for until now. This module is quite advanced! Are these setting documented somewhere that I missed? I hate to take up your time if I could have read it somewhere else.

Shadlington’s picture

Its in the search facets README file :)

davidseth’s picture

Cool. I did read it then, just went straight over my head until I actually used it! This module is good stuff, just a big learning curve. And I have been doing this stuff for years now. I sorta like modules that make me feel like a newb!

I have a few suggestions for adding addition mlt params, but will start that as a new issue.

Shadlington’s picture

I totally agree, it is a big learning curve. There's just so much it can do though, and I think Thomas has done a brilliant job.
Still, better documentation is on his to do list and I've been thinking it could probably do with some tutorials.
I have a bunch of notes that almost form a tutorial as it is - just need writing up - so I may end up creating a few handbook pages at some point.

drunken monkey’s picture

Would be great if you could provide those notes of you! As a handbook page, if you have the time, or just send them to me so I can use them as a starting point when tackling documentation.
As you said, documentation is definitely on my TODO list. I'm aware that non-developer documentation is sadly lagging behind a bit, and much of the Search API and related modules isn't yet quite as usable as it should be.

However, back to the issue at hand: David, have you set the fields to use in the argument handler, and has this changed the Solr query – and, more importantly, improved the results?

davidseth’s picture

Yes, I was able to set the blocks so that they were only used on other search IDs. So now my solr queries look like this:

mlt.fl=t_title,im_field_project,im_field_ref_group,sm_field_tags:name,sm_field_voc_region:name,sm_field_voc_location:name,sm_field_voc_sub_themes:name&start=0&q=id:node-37&qf=t_search_api_viewed&qf=t_title&qf=t_body:value&json.nl=map&qt=mlt&wt=json&fq=((((ss_type:"audio")+OR+(ss_type:"publication")+OR+(ss_type:"video"))))&fq=index_id:node&version=1.2&rows=5

And I do get back results, but only when I added sm_field_voc_region:name (it is in the 'Add related fields' drop down area and is a taxonomy) on the Fields config screen for my search index. When I use the im_field_voc_region (which was in the list of Fields automatically) I didn't get any results.

Why does the im_field_voc_region (which as TID) not work, but sm_field_voc_region:name (which has the term names) work?

When I was using this with Apache Solr, it worked straight away. But Apache Solr uses the string names of the taxonomies by default.

I bring this up because it is a lot of work to get MLT to work, so what can be done to automate this? Maybe by automatically using the term name instead of the tid? But again, I don't know why there should be any difference...

Thanks.

drunken monkey’s picture

Why does the im_field_voc_region (which as TID) not work, but sm_field_voc_region:name (which has the term names) work?

I don't really know, I'm no Solr expert. Maybe MLT really only works with text fields? Or, as mentioned above, maybe term vectors only work with text fields? (In Solr, strings and fulltext fields are both text fields, just with different processing.)
Could you maybe try to use stored="true" instead of termVectors="true" for the "is_*" and "im_*" dynamic fields in your schema.xml file, and then see (after re-starting Solr and re-indexing) if using TIDs still doesn't work?

And are the results now (when using the term names) valid, and similar to those that the apachesolr module displays?

I bring this up because it is a lot of work to get MLT to work, so what can be done to automate this? Maybe by automatically using the term name instead of the tid? But again, I don't know why there should be any difference...

We can't really do this, as we can't dictate the fields the user wants to index. But I of course agree that this should be simpler, and all gotchas should be properly documented.
However, first off we need to figure out why using the TIDs doesn't work. Maybe we can fix this right away, so don't need to mess with complex settings or documentation issues.

davidseth’s picture

Changing to stored="true" had no effect. Still no results. Only when I use text terms do I get results. Perhaps MLT doesn't work with passing in anything other than text.

mh86’s picture

I once had similar problems with the field type "tlong", which is used for integers and term ids. Try to change it to "long" and see if it's working.

marvil07’s picture

miiimooo’s picture

@davidseth #58: the request here is not sent by the MLT argument handler. It should have at least qt=mlt and mlt.fl. This must be some other request.

I've only tested the MLT SOLR feature with text fields in which case the results are symmetric. I'll see if I have some more time for this but it would be really good if it could get committed as the functionality is there in principal and would also allow other search implementations to offer the MLT feature.

I'm attaching an updated search_api_solr patch just with the dataDir setting corrected.

davidseth’s picture

Yes, see #68. I was able to get a much cleaner request. My main thing is that out of the box, MLT on taxonomies does not seem to be working. It isn't until I add the Taxonomy Names that I can get some MLT results.

I am going to try suggestion by @mh86 at #71 and see if I can get any joy.

drunken monkey’s picture

I'll see if I have some more time for this but it would be really good if it could get committed as the functionality is there in principal and would also allow other search implementations to offer the MLT feature.

As said, some code cleanup (that part I'd do myself) and documentation (for the feature, the Solr service class and the argument handler – especially for the last one you'll have to make clear somewhere that it can only be used with certain backends (the argument description, maybe?)) is still needed in any case.
Also, we should really confirm that this works right on Solr, or that we at least document known issues so we don't wilfully get other people as confused as we are right now. ;)

And since this will get committed in any case, and the way the feature works is already pretty much fixed, I don't think (e.g.) Marco will have problems developing this for Xapian before the patch is committed. Except that there is no documentation. ;)

@ David: Please do that, would be interesting to know.

davidseth’s picture

Status: Needs review » Reviewed & tested by the community

In reference to #74, it didn't work. MLT does not appear to work at all on the tid. It only works when I add the term >> name to the index and use those to do my MLT queries. Then they work great. And I can confirm that I can get it to now give me the same results as when I was using apachesolr 7.x-dev. So that is a good thing.

So, as long as this is documented in MLT (for solr at least), then this should be good to go.

Cheers,

David

mh86’s picture

There is also a "mlt.minwl" (minimum word length) parameter, which is set to "3". I would expect this setting only applies for text values, but now I'm not really sure. Maybe it's also used for integers. And in principle, I don't see any reason, why MLT shouldn't work with integers.

drunken monkey’s picture

Status: Reviewed & tested by the community » Needs work

Good point! Internally, Solr (or, at least, Lucene) treats everything as strings, so I wouldn't be too sure about mlt.minwl just being used for text data.

David, could you just test that one last thing (setting "mlt.minwl" to 1 in solrconfig.xml), too? If this really is the reason, we could just set the f.$field.mlt.minwl parameters accordingly when creating the Solr request.
Otherwise, yes, we should probably just document this for now. Maybe we'll find out the reason and fix this some time later.
However, this still needs work (cleanup and documentation).

marvil07’s picture

And since this will get committed in any case, and the way the feature works is already pretty much fixed, I don't think (e.g.) Marco will have problems developing this for Xapian before the patch is committed. Except that there is no documentation. ;)

@drunken monkey: I am kind of confused about this issue, can you point me what is the patch I should apply to search_api? (the conversation/patches here seems to be for both search_api and search_api_solr). BTW Two implementations of the API is better than one to make sure the api feature is in good shape.

drunken monkey’s picture

It's this one, the first in #38.
In the service class' search() method you'll then have to look for the "search_api_mlt" option, which will be (when present) an array containing "query" (the entity ID for which to find similar results) and "fields" (the indexed fields which should be used for comparison).
The corresponding feature for supportsFeature() is "search_api_mlt".

@ miiimooo: I think we should rename the array key "query" to "id", "entity" or "entity id"/"entity_id", as that would be much clearer. It isn't a query, after all, but just the entity ID.

miiimooo’s picture

@drunken_monkey: I'm a bit confused by the git repository - could you maybe help? I've just come back from leave and made this change (query->id). Then I did:

search_api_solr$ git fetch origin
search_api_solr$ git rebase origin/master
search_api_solr$ git commit -a -m "Issue #1111852 by miiimooo: renamed parameter from query to id"
search_api_solr$ git diff origin/7.x-1.x > 1111852-search-api-solr-more-like-this-81-D7.patch

This diff removes all the stuff that you must have added in the meanwhile. any ideas where I'm going wrong?

EDIT: I think I know what I did wrong here

miiimooo’s picture

Here updated patch after minimal change (query->id).

drunken monkey’s picture

Great, thanks!

OK, what is there still to do?
- A bit of cleanup (TODOs, remove commented-out method (I don't think we should change the title))
- Some documentation for the feature
- David, or someone else, should test whether setting mlt.minwl to 0 solves the problem of ignored taxonomy terms.
Otherwise, this seems fine.

davidseth’s picture

@drunken monkey regarding #83, I can confirm that setting mlt.minwl = 0 does work. I now don't have to use the taxonomy >> name as I was before. I have also tested it with mlt.minwl = 1 and it works as well.

So this makes it much easier for end users to create a MLT block with the default exposed taxonomy tids and not have to dig deeper into the >> name stuff.

Cheers,

David

drunken monkey’s picture

OK, great to hear!
Then we should either set minwl directly to 0 in the solrconfig.xml, or set it to 0 for non-text fields manually at request time. (Or the other way round, setting it manually to, e.g., 3 for text fields.)

miiimooo’s picture

@drunken monkey so I suppose in the views argument handler it should go through the list of fields, and see if all of them are numeric fields and if that isn't the case add a parameter.. nah I'm confused. I guess the problem is with when more than one or all fields are selected.

drunken monkey’s picture

No, this wouldn't have to be done by the Views handler but entirely be on the Solr end. Also, Solr can usually override settings on a per-field basis, so we'd just have to set "f.FIELD.mlt.minwl" to 0 for all non-text fields.

See attached patch for a shot at this. Hadn't time to test it, though.
Also, at least in #82, you added the MLT code to the wrong method, searchMultiple() instead of search(). That might have been another reason for this not working as expected.

drunken monkey’s picture

Ravi.J’s picture

This patch works great, quick question however.. I think MLT features though a views handler should be part of search_api_solr project as there is a dependency of argument handlers working only with Solr backend.

drunken monkey’s picture

No, this is specified clearly as a (optional) feature. Other search backends could support this, too, for Xapian this is even already being worked on: #1042694: Implement more like this search api feature with xapian expand terms.

Anonymous’s picture

Thanks for this much wanted feature! But Im having trouble getting it to work.

For a regular node page, node/x, Im showing a block of MLT nodes. I setup the argument Search: MLT with default argument Content ID from Node URL.

It's just that there's no results. View preview (with manual nid set) doesnt show a query or anything. There's no errors or watchdog messages. I'm using Fulltext for fields for similarity (and other fields selected as well).

(I've changed my schema/config.xml's, rebooted Tomcat/solr and re-indexed my indexes)

...I'll try more tomorrow.

drunken monkey’s picture

So this doesn't even work in the Views preview for you? That's weird, even before I managed to set this up correctly the Views preview worked for me.
Also be sure to set the contextual filter options like miiimooo in #29 (even though, as said, the preview worked for me in any case).

Oh, also: Did you apply both patches, for search_api and for search_api_solr? (OK, if you hadn't, you should get an exception, so that probably won't be it …)

drunken monkey’s picture

GSoC has kicked off, so this has now officially top priority. ;)
(Although, of course, there is only little left to do.)

@ morningtime: Are you still having problems getting this to work?

drunken monkey’s picture

Attached is a version that is RTBC, in my opinion:
- "search_api_mlt" feature is clearly documented in search_api_views' REAMD.txt.
- TODOs were removed.
- Solr patch updated to latest changes, and also to enable searches on float and date fields.
Please test!

drunken monkey’s picture

Anyone who can test this?

klausi’s picture

Status: Needs review » Reviewed & tested by the community

Tested the patches, and I can confirm they work!

Matching based on node titles works fine, however I would also like to match against taxonomy terms. If I select only a vocabulary in "Fields for Similarity", then the MLT search never returns a result. If two nodes share the same term, I would like them to appear in the MLT block. I also tried Vocabulary->term name and Vocabulary->term id in "Fields for Similarity", but no luck.

What am I doing wrong? Can you make it work with taxonomy terms in your install?

drunken monkey’s picture

You mean a term reference field, not a vocabulary, right?
Sadly, I can indeed confirm that this is the case – when selecting only the term reference, no results are returned. (Even after some manual playing around with the Solr query.) David said in #84 that it does now work for him, though.
@ David: Does this still work for you with the latest patches?

Also, using the term name instead of the ID did work, I can't reproduce that part.

I also tested whether this maybe has something to do with multivalued fields, but it doesn't work for single-valued term references, either. Changing the Solr field type or setting stored to true for all fields also didn't help.

Anyways, thanks a lot for testing!
I guess if we don't find the solution for the remaining issue in the next few days, I'll just commit this and we can search for improvements afterwards.

drunken monkey’s picture

Project: Search API » Search API Solr
Component: Plugins » Code
Status: Reviewed & tested by the community » Needs work

OK, I just committed this.
Thanks to all helpers – especially miiimooo, of course!

Unsure what to do now. Set to "fixed" and open a new one for fixing the remaining issue? Or set this back to "needs work"?
Opted for the latter for the moment, and moved to the Solr project.

klausi’s picture

Ok, now I managed to create a MLT block that matches nodes with the same taxonomy term: you have to make sure that you index the field (e.g. "Tags >> name") as string and not as fulltext. Furthermore don't forget to re-index after you updated a node for testing, otherwise you get bogus results. This works for the most basic use case of articles and tags.

However, still no luck on my more complex search index with multiple taxonomy term reference fields, still investigating.

klausi’s picture

Further tests revealed that MLT does only work for a certain maximum taxonomy term length. I found that the matching works for terms with a length up to 16 characters. Longer term names do not produce any matching results.

klausi’s picture

Note to self: RTFM http://wiki.apache.org/solr/MoreLikeThis
mlt.maxwl ... maximum word length above which words will be ignored.

This setting is 15 characters in the default solrconfig.xml that ships with this module. Tuning it up to 30 characters solved the term matching problems for me. Maybe we should think about increasing this default setting, as 15 characters is pretty low, especially for languages like German where long words are very common. Don't know if that might have any performance relevance though.

klausi’s picture

Status: Needs work » Closed (fixed)

Follow up about that: #1181260: Increase mlt.maxwl in solrconfig.xml
I think this issue has been long enough, so let's start to break out the remaining problems in separate issues.

Shadlington’s picture

Status: Closed (fixed) » Needs work

Hmmm. That max word length setting - does that apply to the entire length of a term, even if the term is composed of multiple words? I often have terms made up of 2-3 words.

Shadlington’s picture

Status: Needs work » Closed (fixed)

Uh. Not sure what happened there.

miiimooo’s picture

Status: Closed (fixed) » Needs work

Hmm.. I think something has changed with the format of SearchApiIndex->options['fields']. There used to be a 'name' key but that's not in the current dev version so the list of "fields for similarity" looks a bit odd

drunken monkey’s picture

Status: Needs work » Closed (fixed)
pinkonomy’s picture

Could this patch be included as a submodule of Search api Solr?

zilla’s picture

submodule would be nice - but there would be sort of robust readme required to help with things like required string phrases (if applicable) or other basic setups

and btw, what happened to this? it was marked as a complete first submission on the wiki page with a link back to this thread -but no breadcrumb to follow from the search api project page, no other clues...

how are people currently implementing?

pinkonomy’s picture

Finally,how can we implement the "More like this" feature ?I am a bit confused :)

Exploratus’s picture

Issue summary: View changes

Is this implemented? Not very clear in the thread. THanks!