Support from Acquia helps fund testing for Drupal Acquia logo

Comments

funana’s picture

Version: 6.x-2.x-dev » 6.x-2.0-beta1

sorry, changed version...

jordojuice’s picture

Thanks for the link to the patch.
You're right. Since I last worked on the issue queue both Similar Entries and Similar by Terms were using Views 2. I'm going to have to take down both 2.x versions of Similar Entries until this gets fixed. The Similar by Terms patch should go a long way towards the port though.

jordojuice’s picture

This issue will track the progress towards resolving the views integration issue in Similar Entries 2.x branches.

jordojuice’s picture

This patch comes from the Similar by Terms patch you linked to which happened to make a big difference. I applied the changes to the 7.x branch and it got the expected results. So, this patch is for 7.x-2.x. Will try the same on 6.x-2.x.

jordojuice’s picture

Status: Active » Needs work

Hmm... I just realized the most important part of the view is not working with this patch: sorting by similarity. Obviously, the view should be sorted based on the similarity score returned by the query. This needs work.

jordojuice’s picture

A couple commits were made to both 6.x-2.x and 7.x-2.x branches to fix the similarity score issue. However, we currently have no way of viewing the similarity scores within Views to test whether the queries are returning accurate results. Earlier versions of the views integration did include a field for displaying the similarity score, so I may use that code again to check that the scores are being accurately returned and ordered. If anyone wants to test the 6.x-2.x or 7.x-2.x branches they can only be accessed via git until the views integration is sufficiently tested again.

jordojuice’s picture

Status: Needs work » Needs review
FileSize
952 bytes

This patch adds a field for the similarity score and verifies that 6.x-2.x is working correctly in my development environment.
However, the 7.x-2.x branch still needs work with the new changes. The current query as built by Views is returning similarity scores of all zeros.

jordojuice’s picture

Alright, I got a patch put together for the 7.x branch as well. This patch also adds the similarity score field and it changes all Similar Entries queries to boolean. Words within node titles are given more weight in the boolean query.

jordojuice’s picture

Status: Needs review » Fixed

Committed to both branches.

udvranto’s picture

Subscribing....

jordojuice’s picture

Any results from users testing the 6.x-2.x or 7.x-2.x branches would be greatly appreciated. Both versions are working perfectly in my own development environment with Views 3. Particularly:

  • Any warnings or error messages
  • Accuracy of the similar entries lists built by the Views query
  • Feedback on the scoring system provided in the similarity field
  • The stability of the default block view
funana’s picture

Hi Jordan,

thank you for releasing the patch.

[Drupal Pressflow 6.6.23.0 environment]
- I installed the module with the patched file,
- ran update.php (even though it didn't show me an available update)
- no warnings or errors in dblog
- activated the View: similar_entries block
- flushed all caches
- no block showing up...

- checked the view and tried the preview, shows up results just like it should
- added "teaser" to the block fields, preview loads forever, shows

"An error occurred at /admin/build/views/ajax/preview/similar_entries.
Error Description: 0: "

- found out that the "show items" in basic settings was on "unlimited", so changed it to "5"
- hit save and "preview" which gives me the error:

An error occurred at /admin/build/views/ajax/display/similar_entries/default/pager_options.
Error Description: 0:

- flushed all caches again
- block still doesn't display on nodes but preview in views works perfectly, when tested with node id=11
- tried higher node numbers in preview which also resulted in empty results, so I guess the block doesn't show up on new nodes until cron calculated similarity score (??)

my 2 Cents:
- the query takes endless, my server log shows " Script timed out before returning headers: php-cgi, referer: http://server.com/admin/build/views/edit/similar_entries" - the server is ultra lame now.
- Had to deactivate the block because server was not responding anymore

Maybe I will give it another try later. Have to do some other tasks now but wanted to let you know my first test results.

funana’s picture

Couldn't sit still without trying uninstalling, deleting the view and installing it once again.

This time I used the beta 2 version which is already patched.
- Block is displayed on nodes
- No block on newer nodes, possibly because they have no score yet
- Added the teaser in fields, timeouts again
- query seems to be super heavy...

Other than that it seems to work ^^

jordojuice’s picture

Version: 6.x-2.x-dev » 6.x-2.0-beta1
Status: Needs work » Fixed

You're right that the query is unusually slow. I checked the query times when viewing a node on a test site with only 500 nodes and it was already way too long to be reasonably used on a production site. So, this will especially be an issue on large sites. There are a few reasons the query could be running slowly:

  • It's possible that some tables or fields are not being indexed by Similar Entries. FULLTEXT searches can be executed on fields that aren't indexed, but this can and will lead to significantly longer query times. However, the module is designed in a manner that tries to ensure that it doesn't query tables and columns that aren't indexed.
  • The use of a BOOLEAN MODE search may impact the query time as well. However, 6.x-2.x doesn't use BOOLEAN MODE at this time. 7.x-1.x and 7.x-2.x do.
  • It's possible that adding modifiers to the search text (the node's body) in the query can change query time.

This is certainly a significant issue. I will have to try to play with FULLTEXT query to try to get it down. But at least the Views integration issues are apparently much better.

jordojuice’s picture

Version: 6.x-2.0-beta1 » 6.x-2.x-dev
Status: Fixed » Needs work

With 6.x-2.x, I just uninstalled and reinstalled it and Views 3. After the reinstall the default view was not returning results. But once I changed the filter for similarity score > 1 to score > 0 it returned results with scores higher than 1. So, apparently there may be an issue in calculating the similarity score or the default view needs some work.

jordojuice’s picture

Version: 6.x-2.0-beta1 » 6.x-2.x-dev
Status: Fixed » Needs work

I did some tests and monitored the query execution time. It seems to hover around 50ms (fresh install with 1k nodes) in all new versions of the module (6.x-2.x, 7.x-1.x, 7.x-2.x) - so it's independent of Views - and there was no significant difference whether the search was in BOOLEAN MODE or if there were any modifiers (like the modifiers that I use to give the node title more weight). These are certainly slow as queries go, but no reason to cause a timeout. Caching should be implemented to significantly improve load time. I'll have to run the same tests on the 6.x-1.x branch to see what the query times were like with the earlier version. But I suspect that there's another issue at play when adding teasers to the display, so I'll have to try that as well. Also, after doing a clean install in a test environment the query is only being executed for the page node type and not story which must be a view configuration issue.

jordojuice’s picture

Status: Needs work » Needs review
FileSize
3.3 KB

This patch for 6.x-2.x fixes the issue with the similarity filter and improves the similarity score field. The filter properly filters scores and the field has extra options for rounding decimals and properly calculates and displays the score.

jordojuice’s picture

This patch does the same as the last but for 7.x-2.x. It also fixes the errors that were occurring when trying to edit the similarity score filter. However, Views seems to be wanting to format the similarity score select list improperly.

funana’s picture

looks good, thank you.

I don't know why, but the view showed up with "unlimited" items again, had to change it back to a reasonable number ;)

And I can strongly recommend to set the cache under "advance settings" in the view. I set it to 6 hours/6 hours and the query execution time went from over 21.000ms to 21ms, muhaaa ^^

PS: Ahh, and I forget to say that it's still the best module for showing similar nodes in terms of relevance. For me, this is a must have module!

jordojuice’s picture

@funana Thanks for checking it out. You've been a huge help in working out these bugs.

>> still the best module...
I love it too. I also love Views, which is why I want to get the 2.x versions of Similar Entries stable for use on production sites. Last time I checked it is the only related content module with Views integration for D7.

Items still todo for Views integration:

  • Limit query results in the Similar Entries default view.
  • Enable caching in the default view.
  • Fix improperly formatted select list in similarity filter - 7.x-2.x.
  • Write tests for similar_cron() and related functions. Tests will check that the correct tables are being indexed properly and thus ensure all nodes and fields are being searched and the query is maintaining minimal execution times.

Potential additional features:

  • Support the display of similar entries for entities other than nodes.
  • Make better use of boolean mode by providing options for increasing or decreasing the weight of titles or other fields.
jordojuice’s picture

Status: Needs review » Needs work

#17 committed to 6.x-2.x.
#18 needs work.

jordojuice’s picture

Status: Needs work » Needs review
FileSize
4.3 KB

This patch for 7.x-2.x adds onto the previous patch by fixing the issue with the similarity score filter's interface.

jordojuice’s picture

This patch fixes all the issues previously mentioned in 7.x-2.x (except for tests). I rewrote the default view to fix the issue of not displaying on article node types, implement caching by default, and include a limit of 8 entries returned. The similarity score filter uses a properly formatted select list now as well. Also, it adds an option to increase the weight of words in the title of the node in the argument handler, and the default block view is more standardized. This patch will make the 7.x-2.x branch quite stable as far as I can tell.

jordojuice’s picture

And here's a backport for 6.x-2.x.

jordojuice’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.