We'd like to invite all of you to test this patch and comment with correction but not with new functionality

What does this patch do?
adds multi-entity support while maintaining the current functionality. It does not add functionality.

How to use it?

function my_module_apachesolr_entity_info_alter(&$entity_info) {
  $entity_info['myentity']['indexable'] = TRUE;
  $entity_info['myentity']['status callback'] = 'my_module_status_callback';
  $entity_info['myentity']['document callback'][] = 'my_module_document';
  $entity_info['myentity']['reindex callback'] = 'my_module_reindex';

  // Following values are optional
  $entity_info['myentity']['index_table'] = 'apachesolr_index_entities_myentity';
  $entity_info['myentity']['cron_check'] = 'my_module_cron_check';
  $entity_info['myentity']['apachesolr']['result callback'] = 'my_module_result_processing';
}

See comments above for examples for User, Terms and Profile2

Todo:

  1. Review the patch to make sure we do not remove existing functionality or bugfixes
  2. Add the comments back to the $extra var for the node indexing
  3. apachesolr_index_get_entities_to_index only accepts an integer ID. Should be fixed or documented clearly Patch #131
  4. Convert apachesolr_mark_node($nid) to apachesolr_mark_entity($entity_id, $entity_type) but keep the apachesolr_mark_node for backwards compatiblity Patch #131
  5. Add entity support for apachesolr_drush_solr_delete_index drush command Patch #131
  6. Convert apachesolr_node_type_update($info), apachesolr_taxonomy_term_update($term), apachesolr_user_update(&$edit, $account, $category) Patch #131
  7. Review the API docs to reflect the new hooks/alters
  8. Find a better name for hook_apachesolr_index_get_entity_defaults_alter Changed to hook_apachesolr_entity_info_alter
  9. Make sure we remove as many $env_id = NULL as possible to make it more robust See comment #110
  10. Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)
  11. When enabling a new entity type, it should add all the content of that type as remaining to the list
  12. Remove all the unneeded module load includes and moved the module_load_include into a hook_init
  13. apachesolr_index_mark_all_for_reindex needs entity type support
CommentFileSizeAuthor
#135 966796-135.patch2.3 KBnick_vh
#134 966796-134.patch1.95 KBnick_vh
#131 966796-131.patch6.6 KBnick_vh
#128 966796-127.patch1.51 KBnick_vh
#127 966796-127.patch445 bytesswentel
#126 966796-126.patch6.72 KBnick_vh
#124 966796-123.patch614 bytesswentel
#123 966796-123.patch2.51 KBnick_vh
#122 966796-122.patch4.52 KBnick_vh
#121 966796-121.patch139.53 KBnick_vh
#118 966796-118.patch139.39 KBnick_vh
#117 966796-117.patch139.04 KBnick_vh
#116 966796-116.patch138.22 KBnick_vh
#114 966796-112.patch137.63 KBnick_vh
#113 966796-112.patch137.63 KBnick_vh
#111 966796-111.patch136.53 KBnick_vh
#110 966796-110.patch131.42 KBnick_vh
#107 966796-107.patch126.22 KBnick_vh
#101 966796-101.patch115.31 KBBarisW
#99 966796-96-interdiff.patch5.71 KBnick_vh
#98 966796-96-interdiff.patch32.6 KBnick_vh
#97 966796-97.patch113.93 KBBarisW
#96 966796-96.patch113.76 KBBarisW
#92 966796-92.patch109.92 KBnick_vh
#91 966796-91.patch108.37 KBnick_vh
#90 966796-90.patch105.4 KBnick_vh
#89 966796-89.patch105.46 KBnick_vh
#88 966796-88.patch104.61 KBnick_vh
#87 966796-87.patch101.23 KBnick_vh
#85 966796-85.patch101.67 KBnick_vh
#84 966796-84.patch101.67 KBnick_vh
#82 966796-82.patch99.87 KBnick_vh
#81 966796-80.patch90.71 KBnick_vh
#79 966796-78.patch91.76 KBnick_vh
#80 term and user indexer support.zip5.54 KBnick_vh
#77 966796-77.patch100.45 KBnick_vh
#74 966796-74.patch95.51 KBnick_vh
#73 966796-73.patch95.52 KBnick_vh
#72 966796-72.patch66.51 KBnick_vh
#69 966796-69.patch65.96 KBnick_vh
#66 966796-66.patch7.82 KBnick_vh
#67 966796-66.patch204.43 KBnick_vh
#63 966796_63_solr_multi_indexer.patch68.03 KBscor
#61 966796_60_solr_multi_indexer.patch65.89 KBscor
#59 966796.patch65.61 KBnick_vh
#55 966796_55_solr_entities.patch65.59 KBscor
#47 966796.patch60.1 KBnick_vh
#41 966796-multientity.patch62.34 KBnick_vh
#39 966796-multientity.patch64.24 KBnick_vh
#38 966796-multientity.patch332.87 KBnick_vh
#37 966796-multientity.patch64.18 KBnick_vh
#34 966796-multientity.patch57.71 KBnick_vh
#30 966796-multientity.patch51.44 KBnick_vh
#29 966796-multientity.patch51.84 KBnick_vh
#27 966796-multientity.patch4.44 KBnick_vh
#28 966796-multientity.patch56.11 KBnick_vh
#25 966796-multientity.patch45.98 KBwesnick
#23 multientity-966796-w-issue-1187888-24.patch185.95 KBwesnick
#19 apachesolr-multientity-966796.patch41.99 KBLSU_JBob
#12 966796_12_solr_me_wip.patch21.18 KBscor

Comments

pwolanin’s picture

Collaborative effort at: https://github.com/palantirnet/apachesolr

Field SQL storage requires entity ID to be an int:

http://api.drupal.org/api/drupal/modules--field--modules--field_sql_stor...

but catch says generally it doesn't have to be, though that going to be tru usually - is that why the schema has varchar 128? Is there some place that's specified?

pwolanin’s picture

In IRC, bjaspan says he thinks entity ID will always be int, since this is what Field API requires.

Crell’s picture

Well, that's ironic because of all modules Search API uses string machine names which are, I believe, used as the id. (I don't have the code in front of me, though.)

That actually sounds like a rather serious flaw in Field API, since I recall us discussing way back when that entities may not, in fact, have an integer surrogate key.

In any case, yes, entity_id is a varchar in the new indexer code because entities (I thought) are not guaranteed to use integer IDs.

pwolanin’s picture

In Search API the Solr schema has it as an integer for entity ID.

bjaspan said that if the entity does not use an int ID, it is responsible for transforming it into an int for consumption by Field API.

I changed the schema to int, but change it back if you think string is more correct. I assume the int will be more performant - should the Primary key be in the other order (int first) if we we are using an int?

Crell’s picture

Progress!

The git repo noted above now has an indexer that is indexing both nodes and users using the new schema, and I have verified same. The current architecture should extend to other entity types fairly easily, with only a little code for each one. (Basically just to generate the rendered version to index.)

Of course, the theming side for search results currently assumes that it's getting back the fields that nodes would have and some of those are missing (eg, comment_count), so it's throwing various undefined index errors on the Apache_Solr_Document. We can fix that up later, I think. :-)

Peter, when do you want this rolled into a proper patch? With the new indexer in place, it still needs a UI and then we should rip out the old indexer entirely. There's plenty of documentation that needs doing, too.

pwolanin’s picture

Let's discuss the architecture a little more before rolling a patch.

Should the indexer be part of the framework module, or a separate module? Should the search module still be a separate module too? What's the dependency chain?

Crell’s picture

IMO, there should be a central API module (apachesolr) which handles the connection to the solr server(s). Then there's an indexing module for putting stuff INTO solr, and a searching module for getting stuff OUT of the server. (It's perfectly reasonable that we'd want to do those separately, or do only one of them.) The indexer and searcher should depend on the API module, but not each other.

With the new indexer I'm not sure if the existing comment and similar modules need to remain distinct. They could be rewritten as simply hook implementations on apachesolr_indexer, either in that module or in very small add-on modules. (Remember that apachesolr_indexer can easily implement its own hooks on another module's behalf, and is doing so now.) I suppose the UI approach we want to take will in part dictate that question.

Andrey Zakharov’s picture

Will this backported to 6.x?
apachesolr_index hook will be very nice.
Subscribing

pwolanin’s picture

pwolanin’s picture

pwolanin’s picture

My comments from IRC to scor:

  • the palantir code does entity_info_alter - to add to each entity info details about whether and how it's indexed.
  • I'm not sure if there is an easy way to determine if entities are renderable or what
  • in any case - I think we need separate tracking tables for nodes, users, and comments, plus maybe an added generic tracking table
  • I think the palantir code just had the generic table
  • the other thing to think about is that we now are trying to support connections to multiple servers - and so you may want to index some entities to one server and not another
  • we need to consider how to index access grants for non-node entities
  • likely we want to take an approach similar to the node stuff and index a flag that says whether the content is available to anonymous users, since that's actually what we need to know for multi-site integration
  • next, define some kind of callback for each entity type to skip that filter based on a user_access call? unless you want to figure out how to abstract the node access system to entity access
  • In terms of docs to index, the current model anticipates that a callback may return multiple docs for one entity ID, so we probably should stick to that.
scor’s picture

Status: Active » Needs work
StatusFileSize
new21.18 KB

very much WIP, I haven't addressed pwolanin's comments above, but tried to align the apachesolr_indexer module to the recent schema changes... this is what I've got so far. with that I'm able to index nodes and users.

citlacom’s picture

Hi all,

I would like to know the current status of this effort to join into the challenge. I'm looking the correct way to index taxonomy terms as entities to make possible do a search only for entities of type terms. I see that this effort is the most similar to what i need but i'm not sure if you have plans to backport to D6 or will only be available for D7.

Other important aspect is if you currently had beed working in the indexation process of taxonomy terms. I saw work about users but not sure what about other types of entities. Also, is available some documentations for the API of this new approach to index? How will be stored the queue of entities IDS pending of indexation? I had been working in the standard queue of indexation "apachesolr_search_node" and this doesn't make much sense in case of entities queue, there is a new generic schema for this indexation queue or each indexation implementation will need it's own schema?

Thanks in advance, i would like to collaborate with this nice effort.

paulgemini’s picture

Was anything like this committed to any recent branches?

scor’s picture

The latest code for this issue is in the topic branch 966796-multientity. I'm not sure it's in sync with the main solr branch though.

rjbrown99’s picture

This is interesting. One of the things I'd like to index are messages from the Activity module. The messages are independent of nodes, so I would need something like this to add them as a non-node and then search on them. I'd also like this for the D6 branch but maybe that's a bit of a pipe dream at the moment.

digi24’s picture

As you mention it rjbrown, I am also following this issue for some time, hoping for a D6 backport. I have some sites, where upgrading to D7 would rather be a painful option.

LSU_JBob’s picture

Count me in for dev help on this, I need it.

LSU_JBob’s picture

StatusFileSize
new41.99 KB

I did some work on the apachesolr_indexer submodule that was in the multientity branch, but after talking with pwolanin I realized that the multientity branch needed to be "resynced" with the latest version of the 7.x-1.x branch of the module. Anyways, I rebased with no conflicts and have the latest code from 7.x-1.x along with the multientity code now. So! I've attached a patch that should be able to get added on top of the current 7.x-1.x now without any worries of it being out of sync. I also have a local branch that works on my machine (indexing and simple searching of drupal entities).

pwolanin’s picture

Assigned: Crell » Unassigned

So from an architecture standpoint, I discussed with Crell a while ago that {apachesolr_indexer_entities} should possibly be a default table, but for very common entity types liek node we shoudl probably reatin a separate table for performance reasons.

LSU_JBob’s picture

ok how about two tables, apachesolr_indexer_entities and apachesolr_indexer_nodes ? or perhaps apachesolr_indexer_common ?

pwolanin’s picture

The strategy we discussed before was to use the entity info to define a callback or table name. The thought was to have a "template" schema like the the way the {cache} schema is re-used.

I could imagine, nodes, comments, and users as easily being of a scale where one would want separate tables.

wesnick’s picture

StatusFileSize
new185.95 KB

I have been working on this recently, and have taken LSU_JBob's patch and have it successfully indexing other entites. There are still some issues to be resolved, and would love to roll a patch with some of these issues resolved. I am posting what I have done, but I accidentally rolled in #1187888: Geospatial searching with solr 3.x , which just adds an XML for 3.3 and moves config files to a sub directory.

Here are the things that I thing should be implemented:

1. as per #22, if we really want to separate out our indexer tables, then I think we should just do it on bundles, each in their own table, these tables could be added as necessary from the /indexer/bundles admin page when you are checking off your indexed bundles. something in the format of apachesolr_indexer_{entity_type}_{bundle}. Not clear on why any entity type-bundle combo should be treated as more or less important than another.

2. Big issue was naming the Queue Worker callback apachesolr_indexer_process_entity(), because this seems to be a hook implementation of hook_process_HOOK, so renamed it to apachesolr_indexer_index_entity

3. Putting entity bundle "indexable" flags in hook_entity_info_alter is probably not going to work, since many modules haven't actually declared their bundles when the code reaches here. I think every entity should have a single hook to register all of it's workers/callback, in something like hook_apachesolr_index_info() or something, right now, entities declare stuff: status, docuement, reindex; bundles declare stuff: index_my_bundle = TRUE; and then there is a results callback that is declared somewhere else.

cpliakas’s picture

Subscribing.

wesnick’s picture

StatusFileSize
new45.98 KB

I have rerolled my patch now that the other issue was committed. I think all the callbacks should be handled together with a ctools plugin. Then the ballbacks could all live together. Something like:

    if ($callback = ctools_plugin_load_function('apachesolr', 'index', $entity_handler, 'reindex callback')) {
      $callback($document, $entity, $entity_type);
    }
wesnick’s picture

The current field mappings method is not going to work consistently. It is utilizing apachesolr core module's field mappings callbacks. This is problematic because the entity type information is not passed. The assumption in field_mapping is that they are always receiving a node. There is no universal way to find out the entity type unless we add a dependency on entity api, which would expose the entityType() method, but unfortunately core entities don't seem to have this method or reliable way to tell us what they are.

I propose we make this branch dependent on Entity API and use a generic EntityWrapper to extract all the properties which seems to be more widely implemented than hook_apachesolr_field_mappings. Search API does something like this for building indexes.

nick_vh’s picture

StatusFileSize
new4.44 KB

In this patch an extra parameter is added
function HOOK_apachesolr_field_mappings_alter(&$mappings, $entity_type) {

This way a developer has more control over the mappings he decides to add to a specific field and if wanted also to a specific entity type. There is no per bundle alteration possible yet I assume. Not sure if this is also wanted?
Other than that a certain amount of trailing spaces were removed

Wesnick, could you explain in more depth how you would like to leverage entity api other than recognizing the type?

Todo

  • Remove duplicate indexing code and set smart defaults so patch can be easily implement
  • Create a callback/hook in hook_entity_info_alter so modules can define their callbacks to the entities after the info alteration. See #23 comment 3

Regarding the tables, a table per bundle does create quite a lot of clutter in the backend don't you think? Maybe it is more performant to have the default drupal bundles in their designated table and any custom new one shares the same table? It would be a waste of resources to give a entity type bundle with 1 entry it's own table.

nick_vh’s picture

StatusFileSize
new56.11 KB

Sorry, didn't include the new files.. :-)

PS: I forgot some debug message so don't freak out if you try. Whenever I have some of the other todos done I will get it out

nick_vh’s picture

StatusFileSize
new51.84 KB

Added a hook that allows alterations of the default callbacks of the entity types node, user and taxonomy. If a developer would want he can modify these or even add hooks for other entity types without having to redo the entity_info_alter.

HOOK_apachesolr_indexer_get_entity_defaults_alter($entity_info) {

nick_vh’s picture

StatusFileSize
new51.44 KB

Computer clearly does not want to co-operate with me. Cleaner version of the patch without weird symlinks. Apologies

ChrisFlink’s picture

Subscribing, will dive into the current status of this issue, I first checked this issue but this issue looks more constructive!

nick_vh’s picture

Peter, Could you remove that Branch from GIT to avoid confusion?

wesnick’s picture

@Nick_vh, I think this indexer module should use Entity API to lessen the burden on contrib for writing apachesolr-specific hooks. And currently, the field specific indexing_callback can't work without passing an $entity_type parameter. We could use Entity API to extract relevant property info and then have some type of gui similar to Search API where you can set up field mappings dynamically, toggle facet info, customize the target field name, etc.

nick_vh’s picture

StatusFileSize
new57.71 KB

Updated with some new indexer functionalities

  • Load-balanced indexing between entity type, meaning that if you have 100 nodes and 150 terms it will index 25 of each if your cron_limit has been set to 50.
  • Removed some bugs from the indexing process that appeared to re-initiate after everything was indexed.
  • Entity types such as node/user and terms now have their own indexer table. Everything else that doesn't have a default defined (implementable with hooks) will be sent to a generic table "apachesolr_indexer_entities". This allows the module to take some load of mysql when working with huge datasets. If there is a need for having a seperate table the module developer can hook into the defaults and set his table in a similar way as cache is working (see comment #22)

@wesnick I agree on that partly but maybe to make sure that this patch is commited before RC1 it could be better to move that particular configuration flexibility to another issue? The consensus that has been made right now is that this indexer module stays as a submodule for now so as soon as it is committed more alterations can be brought up. Next thing for this patch is to clean up the main apachesolr module from any indexing that happens. I'll see if I can make that happen today

pwolanin’s picture

For my part, I'm firmly opposed to adding a dependency on entity module or ctools for basic functionality for apachesolr.

@Nick_vh - for now, I'd probably index both 50 nodes AND 50 of each other entity type if the limit is 50.

nick_vh’s picture

@pwolanin That does make the solr indexing very heavy if you'd want to index 10 entity types (not sure when that would happen) = 10*50 = 500 instead of the configured 50?

On the other hand I don't see any use-case with 10 indexable entity types for now so I'll revert the load-balancing and modify documentation

nick_vh’s picture

StatusFileSize
new64.18 KB

OK people, another update on this while we are at it.
This new version of the patch has :

  • removes some code chunks form the original module that was responsible for indexing
  • Optimizes the indexer algorithm during a cron run to send multiple documents instead of 1 document to solr (in anology to the original module)
  • Removes the original indexer UI and replaces it with a similar but not optimal UI
  • Indexes 50 of each indexable entity type per cron run

Please try it out and fix the bugs that you encounter. I assume there is still code that needs to be removed from the original module

nick_vh’s picture

Status: Needs work » Needs review
StatusFileSize
new332.87 KB

Something went wrong

ignore this reply please (or maintainer, please remove)

nick_vh’s picture

StatusFileSize
new64.24 KB

Appearently I diffed with the multientity branch instead of the patch above. This time the right patch

scripthead’s picture

Against which version is the above patch applied? It has failed against 7.x-1.x-dev and 7.x-1.0-beta9

nick_vh’s picture

StatusFileSize
new62.34 KB

That is because lot's of changes happened in between I'd guess. I tried to re-apply the patch to the latest dev. Let me know if this one is working for you

voitenkos’s picture

This patch failed against 7.x-1.x-dev and 7.x-1.0-beta10 , dated oct 20th.

Is there a newer patch I can use?

nick_vh’s picture

Same problem over and over :) Too many changes are happening. If you can please look at the patch and see where it conflicts. The majority of the patch is adding a new submodule so nothing should be very hard. I'll do it as soon as I can.

voitenkos’s picture

Having little/no experience with debugging patches(I only know how to apply them) - I don't actually know how to figure out where it breaks. Sorry =(

I tried it against apache_solr 7.x-1.x-dev and 7.x-1.0-beta10 , dated oct 20th, which is the latest release.
Here is what I got :

git apply 966796-multientity_2.patch
error: patch failed: apachesolr.api.php:48
error: apachesolr.api.php: patch does not apply
error: patch failed: apachesolr.module:270
error: apachesolr.module: patch does not apply
966796-multientity_2.patch:601: new blank line at EOF.
+
error: patch failed: apachesolr_search.module:144
error: apachesolr_search.module: patch does not apply

scor’s picture

please try patch -p1, it's more tolerant than git apply.

voitenkos’s picture

I manually went over the patch and patched the latest beta of solr. However there is a bug , and it was occuring even before i started patching. Basically , the apachesolr module sends all I want to the index , but then those items never get processed. They only get processed after I manually restart solr. As if, no "commit" command is happening.

nick_vh’s picture

StatusFileSize
new60.1 KB

voitenkos: That sounds more like a problem with your solr instance. Please verify how you apply the patch
Attached is the patch that should apply cleanly on latest dev

voitenkos’s picture

That worked , thanks.

nick_vh’s picture

@voitenkos : If you tried it, could you please write down a review and tell us what you think should happen?

benys’s picture

@Nick_vh: Maybe apachesolr_indexer should implement hook_search_reset(), hook_update_index()? Do you have code repository to this patch?

nick_vh’s picture

@benys

Did some little research on that :
http://api.drupal.org/api/drupal/modules--search--search.api.php/functio...

Something like this? Correct me if I'm wrong :-)

function apachesolr_indexer_search_reset() {
  foreach (entity_get_info() as $entity_type => $entity_info) {
    if ($entity_info['apachesolr']['indexable']) {
      $bundles = apachesolr_indexer_get_bundles('default', $entity_type);
      $reindex_callback = '';
      if (!empty($bundles)) {
        $callback = apachesolr_entity_get_callback($entity_type, 'reindex callback');
        $reindex_callback = $callback;
      }
      if (! empty($reindex_callback)) {
        if (! $reindex_callback()) {
          drupal_set_message(t('There was an error reindexing @entity_type.  Please consult the log for more information.', array('@entity_type' => $entity_info['label'])), 'error');
          return;
        }
      }

    }
  }
}
benys’s picture

It works for me.
Now I think that apachesolr_indexer_action_form_reset_confirm_submit() should execute apachesolr_indexer_search_reset() :-)

benys’s picture

Hi!

i have another question. Your patch should remove node hooks (apachesolr_node_insert, apachesolr_node_delete, apachesolr_node_update) from apachesolr.module? Now i can't remove node :-(

Fatal error: Call to undefined function apachesolr_delete_node_from_index() in ../sites/all/modules/apachesolr/apachesolr.module on line 848
scor’s picture

Status: Needs review » Needs work

1. I agree with switching the radio button + "Begin" button with a button for each action. however why did you remove the "Index queued content" option? I find that useful to avoid having to run the cron multiple times...

2. I had trouble reindexing the whole site with this patch applied (maybe due to the missing button as explained above), but also because

$ drush solr-index
The external command could not be executed due to an application error.
Drush command terminated abnormally due to an unrecoverable error.
Error: Call to undefined function apachesolr_get_nodes_to_index() in
drupal/sites/all/modules/apachesolr/apachesolr.admin.inc, line 942

3. At the moment the search result for a user only shows the name, and "..." for the search snippet. Is there a technical reason for not indexing the content of user_view()/entity_view() in $document->content in a similar fashion as we do for nodes?

scor’s picture

StatusFileSize
new65.59 KB

I've almost got the batch indexing working: it indexes entities, though it does not always finishes, because apachesolr_index_status() -- which is used to set the batch sandbox max -- hasn't been updated to account for multiple entities. I'm uploading the patch as far as I got so hopefully someone can take it further. Note that I left some of the variables from the old batch, but these will need to be updated.

patch also cleans up apachesolr_index.info

nick_vh’s picture

Cool! Will look at it asap!

scor’s picture

looks like

unset($build['#theme']);

is the reason why users don't have their profile in $document->content. The reason it is on node is that it avoids indexing the node links and other wrapping elements.

scor’s picture

The hook_apachesolr_update_index() has not been ported to the new multi entity system.
here is what one can use apachesolr_indexer_solr_document_node()

  // Let modules add to the document.
  foreach (module_implements('apachesolr_update_index') as $module) {
    $function = $module . '_apachesolr_update_index';
    // @todo specify namespace
    $namespace = 'apachesolr_search';
    $function($document, $node, $namespace);
  }

maybe this could be generalized and ran at the entity level.

nick_vh’s picture

StatusFileSize
new65.61 KB

Rerolled the patch #55 to cleanly apply

scor’s picture

Status: Needs work » Needs review

incorporating suggestion #58 in #59.

Note that I've been using this patch in my patched version of apachesolr for 10 days now and I haven't encountered any major issue so far (I also have #1338350: Extend facetapi_map_bundle() to all entity types applied on facetapi). The site I'm working on is going for QA shortly, and I will report back here if we encounter problems.

scor’s picture

StatusFileSize
new65.89 KB

here is the patch, sorry.

raulmuroc’s picture

Is it expected to work under Acquia Search environment?

scor’s picture

StatusFileSize
new68.03 KB

fix fatal error when deleting a node: Call to undefined function apachesolr_delete_node_from_index(). This new patch removes the old hook_node_insert(), hook_node_update() and hook_node_delete() from apachesolr since these are now handled by apachesolr_indexer.module

pwolanin’s picture

Status: Needs review » Needs work

The 2nd table in the schema is not really aligned with the module as it is now.

t('The name of the core.'),

we don't reverence cores, but rather search environments. Also, in the UI, we only support indexing to the default environment.

nick_vh’s picture

@RaulMuroc, Acquia Search environment is using the regular Solr so this should work yes.

nick_vh’s picture

StatusFileSize
new7.82 KB

Quick re-roll of the Multi indexer. No new stuff

nick_vh’s picture

StatusFileSize
new204.43 KB

Git.. Not always my friend

BarisW’s picture

Thanx Nick!

nick_vh’s picture

StatusFileSize
new65.96 KB

Hold on, git is fighting with me!
*crossing fingers for this one :-)*

nick_vh’s picture

imho a todo for this patch

  1. The apachesolr Status page should be different then the apachesolr indexer page.
  2. The entity type inclusion/exclusion should happen in the Bias configuration page (form alter I suppose)
  3. By default, for this patch to succeed we should probably stick to nodes only and with follow-up patches we can add support for users and taxonomies. It is probably asked to much from a single patch to supply all of this.
  4. Create an API file for the indexer, or update the API file from the apachesolr module.
BarisW’s picture

If I run drush solr-index I get the following error:

Error: Call to undefined function apachesolr_get_nodes_to_index() in sites/all/modules/contrib/apachesolr/apachesolr.admin.inc, line 854

nick_vh’s picture

StatusFileSize
new66.51 KB

Removed some dependencies that would not exist if you define a custom entity type

nick_vh’s picture

StatusFileSize
new95.52 KB

Based on some of the suggestions of BarisW and pwolanin I updated this code heavily.

It should at least integrate much better in the current UI and I've split the user and taxonomy support in to separate modules. Regarding the location of these modules I wish #1324854: Move apachesolr_search.* in a submodule could be revisited any time soon because it'll be a mess if we put all those modules in the root directory.

nick_vh’s picture

StatusFileSize
new95.51 KB

Small fix that makes the index all button work

I forgot a dsm in the patch, not uploading a new patch just for that so please take that into consideration

BarisW’s picture

Hi Nick,

don't know what's wrong, but after applying your patch to the latest 7.x-1.x-dev module, I get the following error:

Fatal error: Call to undefined function apachesolr_index_get_last_updated() in sites/all/modules/contrib/apachesolr/apachesolr.module on line 692

nick_vh’s picture

Hmm, There should only be one left and that is in the cron. Canyou confirm that if you try with the GUI everything is more or less working?

nick_vh’s picture

StatusFileSize
new100.45 KB

This should solve that problem. Still quite some work todo with this patch though

BarisW’s picture

Yep, that issue is gone now. Now a new one when re-indexing (in Dutch, you will understand ;)):

Er is een AJAX HTTP fout opgetreden. HTTP-resultaatcode: 200 Debug informatie volgt. Pad: /batch?id=80&op=do Statustekst: OK Antwoordtekst: {"status":true,"percentage":"1","message":"Submitting content to Solr...\u003cbr \/\u003eIndexed 400 of 65425 nodes"} Fatal error: Call to undefined function apachesolr_get_nodes_to_index() in sites/all/modules/contrib/apachesolr_attachments/apachesolr_attachments.module on line 154

nick_vh’s picture

StatusFileSize
new91.76 KB

As I mentioned above I merged the indexer in to apachesolr.module for this patch. However, I did make a distinction between the indexation functions and the other ones.

This patch should also remove almost all clutter from the previous indexation mechanism.

Todo

  • Make sure all references to the old indexation mechanism are gone
  • Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)
  • Auto enable the node entity when installing
  • Upgrade script? (we remove some tables)
  • When enabling a new entity type, it should add all the content of that type as remaining to the list
nick_vh’s picture

StatusFileSize
new5.54 KB

Indexer modules for term and user in a zip as a reference for the future.

nick_vh’s picture

StatusFileSize
new90.71 KB

Some nasty bug prevented regular nodes from being added to the index

nick_vh’s picture

StatusFileSize
new99.87 KB

Added drush support and better garbage cleanup

BarisW’s picture

When I submit the Reindex button (and comfirm it) I get this error:

Fatal error: Call to undefined function apachesolr_index_mark_all_for_reindex() in sites/all/modules/contrib/apachesolr/apachesolr.admin.inc on line 811

nick_vh’s picture

StatusFileSize
new101.67 KB

Forgot to upload the latest one. See if it solves your problem

nick_vh’s picture

StatusFileSize
new101.67 KB

Minor problem that caused indexing problems through the UI

BarisW’s picture

Another bug. When I save a node, I get this error:

Call to undefined function apachesolr_index_entity_update() in sites/all/modules/contrib/apachesolr/apachesolr.module on line 1627

nick_vh’s picture

StatusFileSize
new101.23 KB

New patch, incorporates an upgrade path + a little bugfix.

nick_vh’s picture

StatusFileSize
new104.61 KB

Updated the excluded type upgrade path
Still need to move the excluded types back to the environment specific configuration instead of the indexer page. Switching from environment will make this confusing.
The entity bundles to index are now environment specific also

nick_vh’s picture

StatusFileSize
new105.46 KB

Indexing is also environment specific and the API for the bundles and entity types has been utlized in a more optimized and better way now.

Most functions now accept $env_id so environment specific indexing is possible but is limited to the default environment for now (similar to the current state of the module)

Still working on the cron part. Not too much work anymore

nick_vh’s picture

StatusFileSize
new105.4 KB

Latest work in progress - Just keeping everyone informed :-)

nick_vh’s picture

StatusFileSize
new108.37 KB

Install and upgrade path are now equal and i have rerolled the patch so it can be applied to latest Beta12 and dev

Todo :

  • Cron seems to remove all the remaining tasks after 1 try
  • Add button to run 1 cron index
nick_vh’s picture

StatusFileSize
new109.92 KB

Cron button was added and cron is working properly now (indexing 50 at once)
The only weird behavior I am encountering is that solr is reporting to process 100 documents while I am very sure I am only sending 50. More investigation needed.

Todo : The index all remaining should maybe list how many are remaining?

Testers wanted!

BarisW’s picture

Thanks Nick, awesome work. I can now index my custom entity and they appear in my search results. The only thing I'm missing is the integration with "More like this". In the MLT settings, I can only select Node Types, my custom entity is not available. And even if it was available, the MLT blocks don't show up on entity pages at all.

The problem is that it relies on $nodes.

If I change this:

function apachesolr_search_block_view($delta = '') {
  if ($delta != 'sort' && ($node = menu_get_object()) && (!arg(2) || arg(2) == 'view')) {
   $suggestions = array();
    // Determine whether the user can view the current node. Probably not necessary.
    $block = apachesolr_search_mlt_block_load($delta);
    if ($block && node_access('view', $node)) {
      // Get our specific environment for the MLT block
      $env_id = (!empty($block['mlt_env_id'])) ? $block['mlt_env_id'] : '';
      $solr = apachesolr_get_solr($env_id);
      $docs = apachesolr_search_mlt_suggestions($block, apachesolr_document_id($node->nid), $solr);
      if (!empty($docs)) {
        $suggestions['subject'] = check_plain($block['name']);
        $suggestions['content'] = array(
          '#theme' => 'apachesolr_search_mlt_recommendation_block',
          '#docs' => $docs,
          '#delta' => $delta
        );
      }
    }
    return $suggestions;
  }
}

To this:

function apachesolr_search_block_view($delta = '') {
  if ($delta != 'sort') {
   $suggestions = array();
    // Determine whether the user can view the current node. Probably not necessary.
    $block = apachesolr_search_mlt_block_load($delta);
    if ($block) {
      // Get our specific environment for the MLT block
      $env_id = (!empty($block['mlt_env_id'])) ? $block['mlt_env_id'] : '';
      $solr = apachesolr_get_solr($env_id);
      $docs = apachesolr_search_mlt_suggestions($block, apachesolr_document_id(arg(2), 'MYENTITY'), $solr);
      if (!empty($docs)) {
        $suggestions['subject'] = check_plain($block['name']);
        $suggestions['content'] = array(
          '#theme' => 'apachesolr_search_mlt_recommendation_block',
          '#docs' => $docs,
          '#delta' => $delta
        );
      }
    }
    return $suggestions;
  }
}

it works!

So.. probably checking permissions (entity_access??) and retrieving the document_id (apachesolr_document_id() has a 2nd default argument of 'node') would be the solution.

pwolanin’s picture

So, I'd consider the shipped MLT block to be only for nodes by design. At the least, I wouldn't consider it an impediment to going forward with this patch, since you can easily make your own MLT blocks for your entity type.

BarisW’s picture

That's true, but I'm almost done with making MLT work as well. I'll submit a patch in a hour.

BarisW’s picture

StatusFileSize
new113.76 KB

See attached patch for MLT fixes. Is there a hook_update needed as well?

BarisW’s picture

StatusFileSize
new113.93 KB

Ah, forgot a small addition.

nick_vh’s picture

StatusFileSize
new32.6 KB

I guess you took the wrong patch as basis?
See the interdiff with 96-92

nick_vh’s picture

StatusFileSize
new5.71 KB

This is the interdiff between 91 and 96. That's more like it I guess :-)

pwolanin’s picture

Let's make MLT discussion a separate patch.

BarisW’s picture

StatusFileSize
new115.31 KB

As requested by Nick, here's my patch applied to 966796-92.

voitenkos’s picture

Is there a patch for the latest dev. (beta12) that will allow me index users?

pwolanin’s picture

@voitenkos - the current patch still requires you to implement some code in a custom module to enable anything beyond nodes.

voitenkos’s picture

@pwolanin - any examples of custom code available out there? I dont mind coding but I just want to know where to start. I assume current patch is :966796-101.patch from comment #101. Am I right?

nick_vh’s picture

Voitenkos : Take patch #92 for now + suggestions from the modules in #80

LSU_JBob’s picture

Voitenkos, check out this profile2 submodule I wrote to integrate it into Apache Solr Integration (multi entity).

#1267330: ApacheSolr integration for Profile2

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

StatusFileSize
new126.22 KB

I merged #1368542: Remove $namespace, and change document handlers to be an alter hook + API documentation cleanup in to patch #92 and this will be the further basis. The patch from BarisW will be sent to another issue whenever this one is committed.
I'd like to invite all of you to test this patch and comment with correction but not with improvements (YET!)

What does this patch do?
adds multi-entity support while maintaining the current functionality. It does not add functionality yet.

How to use it?

function my_module_apachesolr_index_get_entity_defaults_alter(&$entity_info) {
  $entity_info['monument']['indexable'] = TRUE;
  $entity_info['monument']['status callback'] = 'my_module_status_callback';
  $entity_info['monument']['document callback'][] = 'my_module_document';
  $entity_info['monument']['reindex callback'] = 'my_module_reindex';
}

See comments above for examples for User, Terms and Profile2

Todo:

  1. Find a better name for hook_apachesolr_index_get_entity_defaults_alter
  2. Make sure we remove as many $env_id = NULL as possible to make it more robust
  3. Review the patch to make sure we do not remove existing functionality or bugfixes
  4. Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)<>
  5. When enabling a new entity type, it should add all the content of that type as remaining to the list
BarisW’s picture

Status: Needs work » Needs review

Thanks for the update. Are the TODO's in the previous comments all implemented?

E.g. the one in #79: Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)

nick_vh’s picture

Added that to the todo ;-)

nick_vh’s picture

StatusFileSize
new131.42 KB

Removed some of the flexible $env_id possibilities so the function calls are more strict. They now force you to give an $env_id so there is less chance of mistakes.

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

Issue summary: View changes

Changed latest patch

nick_vh’s picture

StatusFileSize
new136.53 KB
  1. Find a better name for hook_apachesolr_index_get_entity_defaults_alterChanged to hook_apachesolr_entity_info_alter
  2. Make sure we remove as many $env_id = NULL as possible to make it more robustSee comment #110
  3. Review the patch to make sure we do not remove existing functionality or bugfixes
  4. Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)
  5. When enabling a new entity type, it should add all the content of that type as remaining to the list
  6. Remove all the unneeded module load includes and moved the module_load_include into a hook_init
  7. apachesolr_index_get_entities_to_index only accepts an integer ID. Should be fixed or documented clearly
  8. apachesolr_index_mark_all_for_reindex needs entity type support
nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

Georgique’s picture

Subscribe

nick_vh’s picture

StatusFileSize
new137.63 KB
  1. Review the patch to make sure we do not remove existing functionality or bugfixes
  2. apachesolr_index_get_entities_to_index only accepts an integer ID. Should be fixed or documented clearly
  3. Find a better name for hook_apachesolr_index_get_entity_defaults_alterChanged to hook_apachesolr_entity_info_alter
  4. Make sure we remove as many $env_id = NULL as possible to make it more robustSee comment #110
  5. Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)
  6. When enabling a new entity type, it should add all the content of that type as remaining to the list
  7. Remove all the unneeded module load includes and moved the module_load_include into a hook_init
  8. apachesolr_index_mark_all_for_reindex needs entity type support
nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

StatusFileSize
new137.63 KB
  1. Review the patch to make sure we do not remove existing functionality or bugfixes
  2. apachesolr_index_get_entities_to_index only accepts an integer ID. Should be fixed or documented clearly
  3. Find a better name for hook_apachesolr_index_get_entity_defaults_alterChanged to hook_apachesolr_entity_info_alter
  4. Make sure we remove as many $env_id = NULL as possible to make it more robustSee comment #110
  5. Backwards compatibility with apachesolr_get_nodes_to_index (for apachesolr_attachments module)
  6. When enabling a new entity type, it should add all the content of that type as remaining to the list
  7. Remove all the unneeded module load includes and moved the module_load_include into a hook_init
  8. apachesolr_index_mark_all_for_reindex needs entity type supportRenamed to apachesolr_index_mark_all_for_reindex($env_id, $type)

@Georgique : There is a follow button on the top of the page. No need for Subscribe replies anymore

PS: I wanted to change the opening post but seemed to be impossible atm, out of sync for now

nick_vh’s picture

Issue summary: View changes

Changed the todo list

nick_vh’s picture

Issue summary: View changes

Update todo

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

Georgique’s picture

Subscribe

nick_vh’s picture

StatusFileSize
new138.22 KB

The reason this was removed is that this value depends on the entity type. It might be better to always clear the cache? Some history of this is welcome

-    if (apachesolr_index_get_last_updated()) {
-      $solr->clearCache();
-    }
+    $solr->clearCache();

Should we put the $env_id in the beginning of the function argument list or in the end? Some guideline for consistency?

-  $rows = apachesolr_get_nodes_to_index('apachesolr', $limit);
-  $pos = apachesolr_index_nodes($rows, 'apachesolr');
+  $rows = apachesolr_index_get_entities_to_index($limit, $env_id);

Needs to be updated

function apachesolr_mark_node($nid) {
  db_update('apachesolr_search_node')->condition('nid', $nid)->fields(array('changed' => REQUEST_TIME))->execute();
}
  • Convert apachesolr_mark_node($nid) to apachesolr_mark_entity($entity_id, $entity_type) but keep the apachesolr_mark_node for backwards compatiblity
  • Add entity support for apachesolr_drush_solr_delete_index drush command
  • Convert apachesolr_node_type_update($info), apachesolr_taxonomy_term_update($term), apachesolr_user_update(&$edit, $account, $category)
  • Review the API docs to reflect the new hooks/alters

What to do with comments? Currently they have been removed from the node indexing process if I recall correctly. I'd support the decision to add a contrib module for comment indexing and ship it with the module?

nick_vh’s picture

Issue summary: View changes

More todos

nick_vh’s picture

StatusFileSize
new139.04 KB

Added documentation for

hook_apachesolr_index_document_build($documents[$id], $entity, $entity_type);
hook_apachesolr_index_document_' . $entity_type . '_build($documents[$id], $entity, $entity_type);
hook_apachesolr_index_document_' . $entity_type . '_' . $bundle . '_build($documents[$id], $entity, $entity_type);
hook_apachesolr_index_document_alter($documents_id, $entity, $entity_type);

All the hooks are now available in the API file. Documentation could always be more extensive but for this patch it should be sufficient I guess

nick_vh’s picture

StatusFileSize
new139.39 KB

Small API doc update

pwolanin’s picture

Status: Needs review » Needs work

We need to index comments by default with the rendered nodes, so that needs to be restored.

nick_vh’s picture

Committed this to a separate branch. Let's keep continuing to work on this using this issue but maybe separate patches for specific functionality?

http://drupalcode.org/project/apachesolr.git/shortlog/refs/heads/7.x-1.x...

nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

StatusFileSize
new139.53 KB

small update, had problems from a cold boot start for a new drupal site

nick_vh’s picture

StatusFileSize
new4.52 KB

Smaller one diffed against the branch and committed to the branch

nick_vh’s picture

StatusFileSize
new2.51 KB

Another one, having the minimum as zero was leading to severe problems when querying. This patch fixes that. It has been committed but I'm sure there could be a better solution towards this problem.

swentel’s picture

Status: Needs work » Needs review
StatusFileSize
new614 bytes

Small isset() check added in entity_info alter if you don't implement apachesolr_entity_info() - or should we really do that (haven't read the complete thread).
(It overwrites my apachesolr key in my custom hook_entity I have).

nick_vh’s picture

You could probably use hook_apachesolr_entity_info_alter, but it seems like a legit patch :)
Pushed it into the branch

nick_vh’s picture

StatusFileSize
new6.72 KB

Some fixes we've found during the testing of this branch with swentel (committed as well)

swentel’s picture

StatusFileSize
new445 bytes

And another small fix.

nick_vh’s picture

StatusFileSize
new1.51 KB

Facets were not working because of an assumption that the query name is apachesolr, reverted this back to the old state.

nick_vh’s picture

Committed both

nick_vh’s picture

I suspect we should review apachesolr_entity_fields and make it easier to actually add custom fields per entity type and expose them to facetapi
Works natively with facetapi

function my_module_facetapi_facet_info($searcher_info) {
  $facets = array();
  $facets['xx_solrname'] = array(
    'field' => 'xx_solrname',
    'label' => t('Data name'),
     'description' => t('Filter by Data name'),
   );
  return $facets;
}
nick_vh’s picture

StatusFileSize
new6.6 KB

This patch clears more things

  1. apachesolr_index_get_entities_to_index only accepts an integer ID. Should be fixed or documented clearly I've documented it clearly
  2. Convert apachesolr_mark_node($nid) to apachesolr_mark_entity($entity_id, $entity_type) but keep the apachesolr_mark_node for backwards compatiblity I got rid of the mark_node and fixed the other ones, this breaks backwards but since we are not stable anyway this is not a problem.
  3. Add entity support for apachesolr_drush_solr_delete_index drush command delete index does not support type, so drush should not either. Could be a feature request, but not for this issue
  4. Convert apachesolr_node_type_update($info), apachesolr_taxonomy_term_update($term), apachesolr_user_update(&$edit, $account, $category) Done!
nick_vh’s picture

Issue summary: View changes

Updated issue summary.

nick_vh’s picture

Committed and merged all changes from 7.x-1.x into 7.x-1.x-multientity + patch #131

nick_vh’s picture

I created some sandboxes to show how you can integrate your own custom entity with this branch

Apachesolr Commerce integration : http://drupal.org/sandbox/nickvh/1379372
Apachesolr User integration : http://drupal.org/sandbox/nickvh/1379368
Apachesolr Term integration : http://drupal.org/sandbox/nickvh/1379370

I guess this shows you how easy it can be to add your own entity type to the indexer.

nick_vh’s picture

StatusFileSize
new1.95 KB

Comments are added again, we really have to make sure this does not break any functionality!

nick_vh’s picture

StatusFileSize
new2.3 KB

Some silly mistake in the use of arguments from a new function. Attached patch has been tested and it is indexing comments for the node types. It would be fairly easy to index them as separate entities as well but I leave that up to someone else. Committed this version

nick_vh’s picture

Status: Needs review » Fixed

Christmas present! It's been committed to the latest development version. Have fun with this and report any bugs as a new issue please :-)

digi24’s picture

Any hope for a D6 backport?

nick_vh’s picture

Yes there is hope for a backport ;-) merry christmas!

LSU_JBob’s picture

holy cow, thanks for this Nick_vh, it's a Christmas miracle

ericmulder1980’s picture

Current development version of apachesolr breaks the apachesolr_attachments module.

Fatal error: Call to undefined function apachesolr_index_nodes()

Added to the apachesolr_attachments issue queue : http://drupal.org/node/1393540

nick_vh’s picture

Status: Fixed » Closed (fixed)
rjbrown99’s picture

Update from #133. If you were looking for sandboxes, they are now here:

User: http://drupal.org/project/apachesolr_user_indexer
Term: http://drupal.org/project/apachesolr_term
Commerce: http://drupal.org/project/apachesolr_commerce

swanpoint’s picture

We're looking at using this module on our corporate intranet to help manage our knowledge base; but missing apachesolr_index_nodes() stops us in our tracks.

pwaterz’s picture

What is the status of this, I cloned the user indexer module and tried to replicate it for profile2 module and it is not working. I got a function not found error for the function 'apachesolr_index_get_indexer_table'.

pwaterz’s picture

@rjbrown99 the function name has changed to apachesolr_get_indexer_table

pwaterz’s picture

Issue summary: View changes

New todo list