Restrict MLT list to nodes of certain types, or the same type as current
David Stosik - February 11, 2009 - 15:38
| Project: | Apache Solr Search Integration |
| Version: | 6.x-2.x-dev |
| Component: | More Like This |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs review |
| Issue tags: | drupal.org redesign |
Description
Would it be possible, to add, in the MLT blocks configuration, the possibility to restrict to some types of nodes, or to the current node's type?

#1
As long as your using the more like this handler you should be able to use an fq query to restrict the results to specific nodes. See http://wiki.apache.org/solr/MoreLikeThisHandler and http://wiki.apache.org/solr/MoreLikeThis
#2
Here is a patch to add a simple sub query field to the recommendations block. The sub query field is added as a fq parameter, to limit the mlt results.
#3
Version 2,
Removed check_plain
Fixed the settings form validation
#4
I think having a bare text field is pretty user-unfriendly. How about a multi-select or checkboxes for all the available types? also, this should probably take into account types that are already being excluded from the index?
#5
I considered that, however I thought that someone might want to do more than just filtering on type, such as only show items for a certain category, user, cckfield, date range, etc.
#6
@mikjoconner - I think for that sort of thing they need a custom module to modify the query - the UIO should jsut have some basic, user-friendly options I think.
#7
I'm in agreement. Our beta tests suggest that screen needs a redesign as few of our users were able to figure it out. Having a default created helped a lot in this regard.
I think we could:
1). Add a hook as Peter is suggesting (I think)
2). Add a variable which can be set in settings.php if someone so desires and document it.
#8
@Jacob - we should already be running the alter hook on the query, right? If not , then I'd make that the way to alter it.
#9
Ah, yeah it does.
Mike, how do you feel about this? Since there is already a provision to do it, perhaps providing some new interface using the facet registry would be a better way to go?
Best,
Jacob
#10
Overall the current solution isn't a good one, it was more of a proof of concept, and a starting point. Overall I think the MLT ui, needs a lot of love. In my opinion limiting this to a simple list of node types is very short sighted.
I really like the idea of adding items from the facet registry, and combining them with a select list, radio buttons, checkboxes, or an autocomplete text field.
#11
Added a checkbox selection for the apache solar more like this block so it is easier for people to really customize their more like this block.
Please review this patch. Diffed against the latest cvs checkout
#12
MLT module is gone in the latest CVS (combined with framework module). Are you using the DRUPAL-6--1 branch?
#13
I checked out the latest head and I still see the MLT module? And I'm sure it is the HEAD. Please clarify?
#14
Right. Do not use HEAD - the active development branch is DRUPAL-6--1
#15
That explains.. :-)
#16
Happy to annouce you can now do this with Apache Solr Views
http://drupal.org/cvs?commit=212234
#17
Now that MLT can be used in a view, are there any other changes that need to be done to close this issue?
#18
I'm thinking this "needs work" as opposed to review.
#19
@Scott in #16, do you mean that this issue can be closed? That a version of this was committed?
#20
Sry I added noise. I shouldn't have. I was just excited about this feature going into Apacher Solr Views project. This has not be committed to Apache Solr Search Integration project.
#21
This is a visual review of #11.
This looks like a bug.
<?php- $fields = array('mlt.mintf', 'mlt.mindf', 'mlt.minwl', 'mlt.maxwl', 'mlt.maxqt', 'mlt.boost', 'mlt.qf');
+ $fields = array('mlt.mintf', 'mlt.mindf', 'mlt.minwl', 'mlt.maxwl', 'mlt.maxqt', 'mlt.boost', 'mlt.fq');
?>
And if it is, it looks like it's still there:
<?php// current apachesolr.module
try {
$solr = apachesolr_get_solr();
$fields = array(
'mlt_mintf' => 'mlt.mintf',
'mlt_mindf' => 'mlt.mindf',
'mlt_minwl' => 'mlt.minwl',
'mlt_maxwl' => 'mlt.maxwl',
'mlt_maxqt' => 'mlt.maxqt',
'mlt_boost' => 'mlt.boost',
'mlt_qf' => 'mlt.qf',
);
?>
These are extraneous comments, right?
<?php+ // additional fq terms
+ // fq=+popularity:[10 TO *] +section:0
+ // in our case this will be fq=mlt_fq +type:blog +type:page
?>
Please pay attention to whitespace issues around Drupal coding style:
<?php+ if(!empty($block['mlt_fqtype'])){
+ $subfq .= implode(' OR type:',$block['mlt_fqtype']);
+ 'fq' => $block['mlt_fq'].$subfq,
// Should look like this
+ if (!empty($block['mlt_fqtype'])) {
+ $subfq .= implode(' OR type:', $block['mlt_fqtype']);
+ 'fq' => $block['mlt_fq'] . $subfq,
?>
Isn't there a logic error here? If you implode on ' OR type:', won't the first one in the array not get the proper prefix?
<?php+ $subfq .= implode(' OR type:', $block['mlt_fqtype']);
?>
Why do we need the subquery text field if we're providing checkboxes?
<?php+ $types = node_get_types('names');
+ $form['advanced']['mlt_fqtype'] = array(
+ '#type' => 'checkboxes',
+ '#title' => t('Subquery for selected types'),
+ '#description' => t('This can be used to filter the result set. Exampe. type:story will limit the suggestions to story nodes. Note: If the list is empty, all content types will be selected'),
+ '#options' => $types,
+ '#default_value' => isset($block['mlt_fqtype']) ? $block['mlt_fqtype'] : array(''),
+ );
+
+ $form['advanced']['mlt_fq'] = array(
+ '#type' => 'textfield',
+ '#title' => t('Results subquery'),
+ '#description' => t('This can be used to filter the result set. Exampe. type:story will limit the suggestions to story nodes'),
+ '#default_value' => isset($block['mlt_fq']) ? check_plain($block['mlt_fq']) : '',
+ );
?>
#22
While I understand mikejoconnor's desire to have a flexible system for these queries, and while jacobsingh points out that variables can be set in settings.php, and pwolanin points out that queries can be modified, I think the feature request for restricting by content type from the admin section is a valid request that fits the 80/20 rule of 80% of use cases with 20% of the work. To get in, the design requirement is that it can still be modified programmatically (via modify_query). I'm also moving this to the 6.2 branch.
#23
Here's an initial patch to expose this functionality to each More Like This block.
The patch does the following:
This permits the user to enter a criteria which would restrict the results. For example:
or
What I'd still like it to do:
One problem that I'm with allowing a simple keyword is that I've attempted to do this (as seen in the commented out code of the patch), is by adding a subquery. For example:
$query = apachesolr_drupal_query('id:' . $id);
$sub_query = apachesolr_drupal_query('-title:strategies');
//apply any additional block specifc filters to the query
foreach (explode(' ', $settings['mlt_res_criteria']) as $idx => $criteria) {
list($field, $value) = explode(':', $criteria);
if ($field && $value) {
$sub_query->add_filter($field, $value);
}
}
$query->add_subquery($sub_query);
The issue is that it doesn't seem to matter what I enter into the apachesolr_drupal_query...as long as there is something entered, the query returns nothing. If i instead leave that blank, but keep the subquery doing the filtering, that works fine.
#24
#25
From the comment above - the last patch doesn't work?
At one point there was a more expanded functionality like this in the MLT module when it was separate. I guess I'm not sure whether this is a general site need, or a site-specific need that shoudl be handled by a little but of custom code.
#26
Hi, I'm not sure I understand the question. If you are asking me if the patch above works, are you referring to #11? If so, I'm not certain how that could work right now given that there is no separate apachesolr_mlt.module in the current release. The mlt block has been incorporated into the basic apachesolr.module.
The patch that i submitted in #23 basically exposes a text field whereby you can add some additional filtering for the MLT block. The rationale being that you might only want related content of a specific type to show up. I believe the patch with this basic functionality works.
Note: The patch is against the 6.x-1.0-RC2 release and the 2.0-dev code appears similar so the patch might work there as well, though i have not tested it against 2.0.
The additional comments that I added afterwards were more in terms of making the interface a bit nicer to the user, rather then just exposing a textbox. This piece might not be necessary, but be more of a nice to have.
thoughts?
#27
@socki have you tried solving this need using apachesolr_views as per Scott Reynolds? http://drupal.org/project/apachesolr_views
For the apachesolr module I'd like to reiterate my design requirements:
- a per-block variable that can contain a filter string
- a getter and setter function for that variable that takes the block module/delta and knows how to set the variable name
- a way to parse and apply the contents of that variable to the query on any mlt block
- a series of checkboxes for content type on the block configuration form that allow the admin to limit the mlt suggestions to a specific content type. I still feel that content types will cover 80% of people's needs.
To keep the block form from clobbering whatever else comes along, the variable should store an array that has a structure something like this:
<?php$mlt_query = array(
'form' => array('type:page', 'type:story'),
'custom' => array('uid:1', 'uid:3', 'tid:17'),
);
?>
The 'form' part is set by the block admin form and the 'custom' part is set by other modules calling the API (the getter setter functions mentioned above). At query time the whole thing is combined into one query.
#28
Here is the patch that I'm working on. I have attached two separate files, though the logic for both is identical. Basically I have two things that I'm trying to get accomplished here. As per the discussion above, I'm hoping that something along these lines can find its way into the module going forward.
1) The handling of the fields is nearly how it was described by @robertDouglass. The variation that has been taken is that rather then have separate _get and _set functions created, the two additional fields were added into the serialized structure that the rest of each blocks data gets stored into. This was done easily by just assigning a default value in the apachesolr_mlt_block_defaults function and adding the corresponding fields to the apachesolr_mlt_block_form function.
This part works in both the 1.x and 2.x patches.
2) The MLT block is then filtered with the addition of some code to the apachesolr_mlt_suggestions function. Basically, the approach currently taken is as such:
//if types available via array
if (is_array($settings['mlt_res_types'])) {
$_apply_sub = FALSE; //by default we will not apply the subquery
$sub_query = apachesolr_drupal_query();
//loop over content type restrictions
foreach ($settings['mlt_res_types'] as $type => $enabled) {
//if at least one content type was selected, then we'll limit results based on them
if ($enabled) {
$_apply_sub = TRUE;
$sub_query->add_filter('type', $type);
}
}//end - loop
//if we're restricting the results
if ($_apply_sub) {
$query->add_subquery($sub_query);
}
}//end - is array
The code above loops over the content types enabled for the particular MLT block and adds it as a filter. This should suffice for about 80% of users of the block.
//if there was an additional criteria specified
if (trim($settings['mlt_res_criteria']) !== "") {
$_criterias = explode(' ', trim($settings['mlt_res_criteria']));
$sub_query = apachesolr_drupal_query();
foreach ($_criterias as $idx => $_criteria) {
if ($_str_pos = strpos($_criteria, ':')) {
list($field, $value) = explode(':', $criteria);
$sub_query->add_filter($field, $value); // The TRUE makes it a negative filter.
} elseif ($_str_pos = strpos($_criteria, '^')) {
$params['bq'][] = $_criteria;
} else {
$sub_query->add_filter('title', $_criteria); // The TRUE makes it a negative filter.
$sub_query->add_filter('body', $_criteria); // The TRUE makes it a negative filter.
}
}
$query->add_subquery($sub_query);
}//end - if
As an added bonus, user's would have the ability to tweak the results even further by entering in an additional criteria. The way this current functions is that it attempts to allow to write basic solr queries which it then will break apart and parse if necessary in order to allow for boosting of terms, and additional keywords.
This is only functioning in the 1.x branch.
The reason that this appears to not to work in the 2.x branch is that the logic within the apachesolr_modify_query function is different. In the 1.x branch, queries are added to the parameters as such:
if ($query && ($fq = $query->get_fq())) {$params['fq'] = $fq;
}
In the 2.x branch, the queries are parsed and added to the parameters like this:
if ($query && ($fq = $query->get_fq())) {foreach ($fq as $delta => $values) {
foreach ($values as $value) {
$params['fq'][$delta][] = $values;
}
}
}
It seems the issue may be because $values in the 2.x version is expected to be an array, but it is not.
Question is, in the 2.x implementation, how should I be adding these additional filters so that it can be parsed and subsequently filtered correct?
Thanks in advance to your help.