Summary

This module provides a Sphinx backend for the Search API module.

It uses Sphinx real-time (RT) indexes and sphinxQL (requires Sphinx 2.0.3-release).
It currently supports the "Search API facets", "Search API facets OR" and "Search API More Like This" features.

Project Page

http://drupal.org/sandbox/orakili/1443484

Git Command

git clone --branch 7.x-1.x http://git.drupal.org/sandbox/orakili/1443484.git

PAReview

http://ventral.org/pareview/httpgitdrupalorgsandboxorakili1443484git-7x-1x

Drupal Version

Drupal 7

Reviews

Comments

klausi’s picture

Issue tags: -PAreview: review bonus

Removing review bonus tag, there are not manual reviews listed in the issue summary. Please read #1410826: [META] Review bonus again.

orakili’s picture

Ah sorry about that. I misread and thought it was related to the PAReview only.

orakili’s picture

Priority: Normal » Major

Changing priority as it's been more than 4 weeks.

zenlan’s picture

Status: Needs review » Needs work

Can imagine why its taking so long to get a review, any module that requires the installation of third-party services will probably put off some reviewers. Problem with the review process not your module though. Am never sure if we are conducting code reviews or user testing. Anyway, as I write Search API service classes myself and use Solr a lot I thought I'd give Sphinx a go. The usual kerfuffle using a service for the first time but not the worst I've encountered (Sphinx I mean, not your module).

PAREVIEW:

Returns no errors
http://ventral.org/pareview/httpgitdrupalorgsandboxorakili1443484git-7x-1x

INSTALLATION:

Smooth, no errors.

USAGE / WORKFLOW:

Created a server, no problems. The correct server path was created.

Created an index, no problems but could use a little bit of helpful text to guide user through the 'workflow'. You have some information on the create server page and in the README.txt that would be useful on the index pages instead, imo, if it is possible to get that info across there (haven't checked myself).

'Index Now' threw an exception in db_query:

PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 unknown local index 'sphinx_index_mysite_sphinx_test_index_rt' in search request: SELECT id FROM sphinx_index_mysite_sphinx_test_index_rt WHERE id IN (1) LIMIT 1; Array ( ) in SearchApiSphinxService->querySphinx() (line 1242 of E:\xampp\htdocs\mysite\sites\all\modules\search_api_sphinx\service.inc).

The table "sphinx_index_mysite_sphinx_test_index_rt" did not exist in my db and I couldn't find any code that created such a table. My db user has permissions to create tables, I debugged through creating an index and didn't see a table get created.

The configuration file was created and contents appeared correct (using all defaults) but the specified index path was not created on the filesystem: (yes I am using Windows here btw)

index sphinx_index_adlibtools_sphinx_test_index_rt {
	type = rt
	path = e:\var\sphinx/sphinx_server_mysite_sphinx_test/sphinx_index_mysite_sphinx_test_index_rt

MINOR/COSMETIC:

Missing default dir on Search API create/edit server config page:
"Base directory in which the necessary files and directories will be created. The directory must exist and the application user The Boss must have full access. Default is"

Typo on Search API create/edit index config page:
'MORE LIKE THIS' SETTINGS Minimim word length

GENERAL COMMENTS:

I found the code well-structured, easy to read and understand. Having never used Sphinx before but plenty of experience with Search API I didn't have any trouble figuring out what to do. Perhaps less savvy users might like a bit more help but you will find out what they need from the issue queue no doubt. :)

Perhaps defining all of those configuration defaults every time the module/service class is loaded is a bit unnecessary, they are only used by the configuration form and could probably be simple local variables or even just hard-coded assignments.

I would add 'write some tests' to your ToDo list, even though testing in Drupal Simpletest is less fun that watching paint dry. Useful nonetheless.

Haven't yet tried the facets or mlt features.

Am happy to continue testing this if you can help me get over the indexing bug. It looks like a very useful module. :)

klausi’s picture

Priority: Major » Normal

Setting priority according to http://drupal.org/node/894256

orakili’s picture

Thank you very much Zenlan for taking time to review this module.

I'm setting up a windows environment to solve the issues you encountered. They seem to be related to the way the sphinx search service is started.

I will update the code once the testing is finished. Thanks for you patience.

orakili’s picture

How to start/stop the sphinx service under windows has been changed (commit 28f5549ed0bcf02b2ac26c4e02b82fbcc5abc4fa).

It should work properly now.

zenlan’s picture

Hello,

I can index now! No more exception. :)

I have created a view based on a search_api_sphinx index:

Sphinx Test Index

Status
enabled (disable)
Machine name
sphinx_test_index
Item type
Node
Server
Sphinx Test Server
Index options
Cron batch size
50 items per cron batch.
Indexed fields
Node ID, Content type, Title (1.0 x), Status, Author, Item language, The main body text » Text (1.0 x), Tags » Name (1.0 x)

The fulltext search query string generated by this index...
SELECT *, weight() AS score FROM sphinx_index_adlibtools_sphinx_test_index_rt WHERE MATCH('@(sphinx_title,sphinx_body___value) (\"Two\")') ORDER BY sphinx_author ASC LIMIT 0,10
... is throwing an error: "Non recognized field type list" and no results are being returned. My term is in the node body text. Terms in the title are returning correct results in fulltext search despite the error.

Sorting on 'Relevance' returns an error: "Invalid sort criteria on unknown or unindexed field 'search_api_relevance'"

Using 'Tags' as a filter returns an error: "Invalid filter on unknown or unindexed field 'field_tags:name'"

Using 'Title' as a filter returns error: "Invalid filter operator '='. Field: 'title'."

I will leave it at that for now, still needs some polishing up with regard to Views integration but it is rather good. :)

zenlan’s picture

I should point out that this is a minimal Drupal 7 installation with only content types Article and Page and one simple 'Tags' vocabulary.

andypost’s picture

orakili’s picture

Thanks both of you for helping with the review.

#1536866 has been fixed and was due the use of the tokenizer preprocessor which was not handled as Sphinx has its own preprocessors.

Most of the errors encountered with views are due to unsupported features.

It's not possible to sort using a field indexed as fulltext because Sphinx doesn't store the string (ex: tags:name). If the field has only a sorting or filtering purpose it should be set as string instead of fulltext.

I will add that to the documentation and make the error messages more explicit.

zenlan’s picture

Status: Needs work » Reviewed & tested by the community

Hi,

The #1536866 issue harks back to my comment earlier:

Created an index, no problems but could use a little bit of helpful text to guide user through the 'workflow'. You have some information on the create server page and in the README.txt that would be useful on the index pages instead, imo, if it is possible to get that info across there (haven't checked myself).

I recalled that you wrote on the server info page...

Will use internal Sphinx preprocessors, so Search API preprocessors should for the most part be deactivated.

...so I knew not to enable any processors. I realise that Search API doesn't really consider that not all index types support or require all of the search_api workflow elements such as processors, it would be great if there was a hook_default_search_api_index_info() method where you could add some info or the user as per the server page. I wonder if you can do something with hook_default_search_api_index_alter() to remove unsupported processors?

'relevance' is a special field, its not fulltext, you should be able to sort on it. If you take a look at the service class for search_api_solr you can see how it maps the Solr 'score' to the search_api_relevance field. I am working on a service class that does not return any kind of 'score' but I still handle that special field by assigning it with 0 always. The field is rendered effectively useless but at least it doesn't simply throw an error.

I know what you mean about the datatype dictating the function. Just an idea, but perhaps hook_views_data_alter() might be useful, possibly assigning your own filter handler to special fields? Is there any way of tweaking the query to allow a fulltext field to act as a filter? (I do this in one of my service classes)

Anyway, I don't think that fine-grained views integration or search_api api constraints should stop your module getting through, adding documentation is imo sufficient at this point. Going forward, these things might get resolved in the issue queue if users actually require them and the search_api maintainers will no doubt help along the way.

I consider this module to be in good enough shape to be accepted.

klausi’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +PAreview: security

You have not listed any reviews of other project applications in your issue summary as strongly recommended in the application documentation.

manual review:

  1. search_api_sphinx_search_api_service_info(): do not pass variables to t() where possible as this cannot be picked up by translation extraction tools.
  2. configurationFormSubmit(): if you just call the method in the parent class you can omit the method entirely.
  3. "SELECT id FROM {$index_name} WHERE id IN (" . implode(',', $id_list) . ") LIMIT {$limit}";: do not concatenate variables directly into queries to avoid SQL injection. Also elsewhere. See http://drupal.org/writing-secure-code . querySphinx() should have a second parameter that holds the replacements as expected by db_query().
  4. deleteSphinxServerDirectory(): maybe you want to use file_unmanaged_delete_recursive() here?
  5. "watchdog('search api sphinx', 'Non recognized field type !type', array('!type' => $type));": always use the "@" placeholder to automatically sanitize any variable that should be plain text anyway.

But otherwise this looks nearly ready to me.

orakili’s picture

Thank you for pointing out those problems.

Issues (1), (2) and (5) have been fixed. For (4), file_unmanaged_delete_recursive() can not do the work as I'm deleting several files with different extensions for which only the basename is known so it would be more complex than GlobIterator.

Still working on (3). I'm using db_query to communicate with Sphinx which emulates the mysql protocol and has its own syntax (SphinxQL) which is close to standard SQL but has some differences especially regarding escaped values.

orakili’s picture

Priority: Critical » Normal

I've reviewed the security issues and committed some changes. Note that I could not use db_query and placeholders for several reasons related to the fact the syntax used is not true SQL but SphinxQL which has its own specificity.

- db_query calls DatabaseConnection::query() which use PDOStatement::execute(). As stated in the documentation page, all arguments passed through execute() are treated as strings. Thus everything is quoted or SphinxQL doesn't not accept quoted values other than strings.

- I couldn't use either PDOStatement bind parameters or values (see PDOStatement::bindValue()) because PDO doesn't handle unsigned integers (see bug https://bugs.php.net/bug.php?id=44639). Unsigned integers are an important part of the SphinxQL language especially for string comparison using CRC32 function.

So values are sanitized using the function SearchApiSphinxService::convertFieldValueToSphinxValue() and some others, which check the type of the value and prepare it accordingly so it's safe to use in queries.

orakili’s picture

Issue summary: View changes

Now supports Search API MLT.

andypost’s picture

Status: Needs work » Needs review

I'd like to mension about SphinxQL - it does not support table-aliases so @orakili is right that PDO and dbAPI has a lot issues when communicationg with Sphinx.

My +1 to promote this project to full!

avpaderno’s picture

Priority: Normal » Critical
orakili’s picture

Priority: Normal » Critical
Issue tags: -PAreview: security

Removed "PAReview: security". See #15.

orakili’s picture

Issue summary: View changes

Added reviews section

orakili’s picture

Issue tags: +PAreview: review bonus

Added reviews of modules.

klausi’s picture

Priority: Critical » Normal
Status: Needs review » Reviewed & tested by the community
Issue tags: -PAreview: review bonus +PAreview: security

Please don't remove the security tag, we keep that for statistics and to show examples of security problems.

manual review:
most sprintf() %s calls are useless as PHP casts variables to strings anyway when concatenating.

Looks RTBC to me. Removing review bonus tag, you can add it again if you have done another 3 reviews of other projects.

klausi’s picture

Status: Reviewed & tested by the community » Fixed

No objections for more than a week, so ...

Thanks for your contribution, orakili! Welcome to the community of project contributors on drupal.org.

I've granted you the git vetted user role which will let you promote this to a full project and also create new projects as either sandbox or "full" projects depending on which you feel is best.

Thanks, also, for your patience with the review process. Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.

As you continue to work on your module, keep in mind: Commit messages - providing history and credit and Release naming conventions.

Thanks to the dedicated reviewer(s) as well.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Additional reviews