Especially for node, access rights should be better incorporated. Doing this on a generic entity-level is well-nigh impossible, but we should at least provide tools to make configuring this easier, and maybe some specialized ones for nodes.
Especially data-alter callbacks could/should be used to restrict the entities to be indexed, e.g. to only index nodes of specific types or (unrelated, but would also be nice) languages. One (rather sophisticated) approach would be this one: #939430: Integrate Rules conditions to filter indexed entities.
Putting a "Published" filter on views is a thing most people will probably know from normal node views anyways. For search pages there is no such solution, though, so maybe we should clearly state this somewhere when creating one (or at least if it's one for nodes)?

What else can we do in this area?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

fago’s picture

Hm, I think the apache_solr module has some node access integration. I wonder how they solved that?

drunken monkey’s picture

In a rather complicated way, from what I remember. Doing node access isn't an easy task if you want to mind all cases. Could maybe / probably be added through a data-alter callback and a query preprocessor, if someone knows how it works.

fago’s picture

I see, maybe we could provide an option to run entity_metadata_entity_access() on the result and hide the entity if its false? For sure, not very efficient.. As default behaviour I guess just documenting it would be best.

Dave Reid’s picture

Component: Code » Framework
Priority: Normal » Major

Wow, this is actually kind of a critical thing and would prevent any stable release since it would be a security bug.

becw’s picture

Checking each entity for access and then hiding prohibited items would mess up paging in most situations. For nodes, basic access issues could be addressed by not indexing unpublished content by default, and to hide all results when accessed by a user without "access content". For other entity types... generous use of Search API processors to remove sensitive data during indexing?

drunken monkey’s picture

With views, you can easily filter out inaccessible items in most cases.
Furthermore, #939430: Integrate Rules conditions to filter indexed entities will provide additional ways to filter out items at index-time. Also possible would be a data alteration that lets admins specify a user and filters out all items the user has no access on.

For now, site builders will just have to take care of this in other ways. Not sure if you could call that a "security bug".

Or should we also add entity_access calls as kind of a last resort, and accept that it might mess up paging? However, this would be a false feeling of security, as entities might not even always be loaded, so we can't always check for access rights – meaning there would then really be security holes.

Dave Reid’s picture

Problem is, we're not using views for the search result pages. We're using the built-in search_api_page module. Is there a hook that runs prior to indexable items being saved to the index? I saw something possibly referenced in another issue but there was no mention of the hook name nor a link to the commit where it was added.

drunken monkey’s picture

Yes, there is a hook: hook_search_api_index_items_alter(). You can of course use this if you are sure you always want to apply the same rules regarding what items should be indexed.
Otherwise, a data alteration would be a much cleaner way – and, if generic enough (although in this case I'd probably also consider adding something node-centric), it could even be added back to this module.

Dave Reid’s picture

@drunken monkey: If I edit an existing node that has already been indexed to unpublish it, will implementing hook_search_api_index_items_alter() remove it from the existing index as well Or what would we need to do?

drunken monkey’s picture

Yes, this is already done. There is just a slight gap where it is still indexed, between being unpublished and the next cron run, of course.

Dave Reid’s picture

For reference, here's what we had to implement in the meantime:

/**
 * Implements hook_search_api_query_alter().
 *
 * Currently the search API module does not factor in node access in its
 * indexing nor queries. This is a temporary solution until entity access is
 * properly respected upstream. This solution is very temporary and fragile
 * in the fact that it can only add a check on the {node}.status field. Proper
 * node access and grants are not respected at all.
 *
 * @todo Remove when http://drupal.org/node/955088 is fixed.
 */
function custom_search_api_query_alter(SearchApiQueryInterface $query) {
  $index = $query->getIndex();
  if ($index->entity_type == 'node') {
    if (!user_access('bypass_node_access')) {
      $query->condition('status', 1);
    }
  }
}

This could also be considered as a temporary solution, but I thought checking on query was better, although less fool-proof:

/**
 * Implements hook_search_api_index_items_alter().
 */
function custom_search_api_index_items_alter(&$items, SearchApiIndex $index) {
  // Prevent nodes that are not accessible to anonymous users from being
  // indexed.
  if ($index->entity_type == 'node') {
    foreach ($items as $nid => $node) {
      if (!node_access('view', $node, drupal_anonymous_user())) {
        unset($items[$nid]);
      }
    }
  }
}

It didn't look like the SearchApiQueryInterface allowed joining on other tables or respecting query tags like normal SelectQueries, so it seems pretty much impossible to reliably work with entity or node access on query.

drunken monkey’s picture

It didn't look like the SearchApiQueryInterface allowed joining on other tables or respecting query tags like normal SelectQueries, so it seems pretty much impossible to reliably work with entity or node access on query.

Well, that would a) only fix the problem for the database backend, and it is also b) made impossible to do generically as Drupal doesn't specify a generic way to check entity access.

Dave Reid’s picture

True, but you are based on the Entity module which I believe does. :) Yeah I now see that adding the joins would only work for the db indexer. Tricky situation indeed.

drunken monkey’s picture

True, but you are based on the Entity module which I believe does. :)

Yes, but only with a function, not by altering a database query. (As far as I am aware?) Therefore, we'd be back at "check all results for access rights and potentially mess up paging".

Dave Reid’s picture

Yeah that's what I meant by tricky as that's not ideal at all. Maybe we need to look into how ApacheSolr respects node access.

drunken monkey’s picture

Yeah, maybe. But this still has the disadvantage of being node-specific. If we take care of node access, users will think it will work for other entities, too – other than when we clearly state that the user has to configure the index to mind access rights. So this might be a different security risk, in some cases.
Also I think that letting all search backends implement their own way of node/entity access is kinda dangerous, as when one would switch backends it could be that things that were secure previously, aren't secure anymore.

This really is a complicated topic, and I'm happy for all input. But if there was an easy solution, I'd have done that long ago. I'm of course aware that having a search module that doesn't do access checks on search results is a dangerous thing.

awolfey’s picture

subscribe

Jon Pugh’s picture

Dug this out of apachesolr.module's schema.xml:

 <!-- This field is used to store access information (e.g. node access grants), as opposed to field data -->
   <dynamicField name="access_*" type="integer" indexed="true" stored="false" multiValued="true"/>

I am going to start work on this next week, as its crucial to my current project, we are using flag_friend_access.module, so we'll have to use the node grants.

Dave Reid’s picture

I feel like there should be some kind of warning on the project page for users downloading the beta of this module to make sure they're aware of the issue and that it's being worked on.

drunken monkey’s picture

Good idea. I added a note to the release notes, seemed to be the best place. I don't think many people will read the whole project page before downloading the module – but I'm open to other suggestions where this should be noted.

Dave Reid’s picture

I think we should add this tag as it would prevent a 1.0 release since that's when the security team would have to get involved.

drunken monkey’s picture

Hm, I guess you're right there. We definitely have to do something there – I'd just like to have any idea, what.

Well, for want of anything better, we'll probably really just have to do entity_access() calls on all search results … And somewhere note that the user should aim to make this unnecessary, as it messes up paging.
I also hope, Wolfgang will finally do #939430: Integrate Rules conditions to filter indexed entities, so users can easily select that they don't want to index unpublished nodes in the first place.

Would you think it smarter to do the checks when using the results (in search views, search pages, …, separately) or directly in the query method (with some option to opt out of it)? With the former option we'd of course also have to add a note for developers, that they will have to keep this in mind.

Opinions? Alternatives? I'm open for both.

fago’s picture

I don't think we cannot generally do it when query-ing, as any module could react on the node-access query tag as it likes and we cannot replicate that.

So calling entity_access() on the results sounds good to me. I'm not sure how we should solve the paging issue though, maybe views folks could help us with that.

@node-access grants:
I think it would be a nice feature to be able to index node-access grants, but that's another issue. If search-api documents that it lists what is configured regardless that should be totally fine. Views already works the same way.

So given that assumption, entities not being index up2date are the remaining problem, as that's the only situation something might be "leaked". Additionally it might lead to not-nice side affects when a not up2date entity is listed, as the data from entity_load() might be totally different to what has been searched for.

So perhaps we can come up with a good way to remove entities from search results that are marked as dirty (=need to be reindexed). E.g. like passing NOT IN (ids) to each query? Or immediately mark them dirty in solr or so?

fago’s picture

or maybe promote the "index immediately" option more (= enable by default), so it's secure + allow disabling it for performance reasons + a note on why that's dangerous?

drunken monkey’s picture

So calling entity_access() on the results sounds good to me. I'm not sure how we should solve the paging issue though, maybe views folks could help us with that.

I'm pretty sure the paging issue cannot be solved, unless we always load all results, then do access checks and then display the correctly paged results.

I think it would be a nice feature to be able to index node-access grants, but that's another issue. If search-api documents that it lists what is configured regardless that should be totally fine. Views already works the same way.

Hm, so should it always call entity_access() or shouldn't it?

So given that assumption, entities not being index up2date are the remaining problem, as that's the only situation something might be "leaked". Additionally it might lead to not-nice side affects when a not up2date entity is listed, as the data from entity_load() might be totally different to what has been searched for.

I don't think we have to care much about that. When data is updated, it will hardly be top-secret from one moment to the next, and stale data will be a problem for nearly all search solutions (unless data is indexed immediately, but then we have the performance problem).

So perhaps we can come up with a good way to remove entities from search results that are marked as dirty (=need to be reindexed). E.g. like passing NOT IN (ids) to each query? Or immediately mark them dirty in solr or so?

To do that, we could just delete the items from the server, but I really don't think this is practical. When changing data, the changes will in probably 90% of cases not be drastic enough to warrant such a behaviour – and as said, even when the change is that drastic, data will hardly ever be top-secret from one moment to the next. And even then, the new data would get displayed and the old one only be inferred through the things being searched for.
And, as you say, if this is really crucial for you, there's always the "Index immediately" option, which we could point users to in the README and/or other documentation.

fago’s picture

>When data is updated, it will hardly be top-secret from one moment to the next, ...

oh, it might be. It might even be security-relevant... Consider workflow modules having different states.

And even if not, it's still not nice to get search results that display something that I've not searched for. Having entity-load showing the up2date version while the index used the old version is an odd discrepancy.

In the end it's for sure a trade-off between performance and being totally up2date, so it should be configurable. Still, search-api should really try to make users aware of that.

>To do that, we could just delete the items from the server, but I really don't think this is practical.
What that be much faster than just using the immediate update option?

>Hm, so should it always call entity_access() or shouldn't it?
Views doesn't do it always, the same way I do not think search api needs to do it always. (Views only does access checks for field-level data, but not for full rows). As else, you'd run into the paging problems...

drunken monkey’s picture

oh, it might be. It might even be security-relevant... Consider workflow modules having different states.

I consider them, and see neither anything contradicting my conclusion, nor your point.

Let's just discuss what to implement:
- Tell people in the README and documentation to build Views such that no inappropriate data is shown.
- - Still call entity_access() to be on the safe side, even if it messes up paging?
- There is no real alternative for search pages, other than not indexing unwanted items -> so we always call entity_access() for those results, even if it messes up paging.
- Add a note to the README and documentation that users for whom immediate reactions to data changes are important should just activate the indexes' "Index immediately" options.
- Add a note to the README and documentation for developers that they should take care about that stuff when using searches for their own purposes.
Does that sound about right?

drunken monkey’s picture

- - Still call entity_access() to be on the safe side, even if it messes up paging?

We could also provide a setting on the query class to this behalf, as a compromise. (I think this now works? Otherwise maybe a filter, as a workaround …) That way, we also won't have to load the entities unless desired by the user.

fago’s picture

> - - Still call entity_access() to be on the safe side, even if it messes up paging?

I do not think this is a good idea. If someone configures a view that would expose access-restricted data to users, the view would appear just broken. Why shouldn't it just work?
One needs to restrict not accessable items on query or index time (e.g. add filter for unpublished nodes..) anyway for the paging to work + people are used to that in views.

I think it would be a nice feature to be able to index node-access grants, but that's another issue. If search-api documents that it lists what is configured regardless that should be totally fine. Views already works the same way.

Hm, so should it always call entity_access() or shouldn't it?

It shouldn't, but have an option to index and apply node-access grants.

The only security-problem I see lies in stale index data combined with entity-load. Consider this example:
* Search-API is used to index all nodes.
* Search-API lists a published node.
* A content admin, adds some internal, secret notes and unpublishes the node, until the problem is fixed.
* If immediate indexing is off, the node index is out-of-date, so the filter for published nodes will still pass the node.
* The search result is retrieved, and the node is shown via node_load() + node_view(). As node_load() loads the latest node from db, the internal, secret note has been exposed.

So yes, it's an unlikely case as it has to be "published" first, still it might happen. And it will happen much more often in simple non-security relevant cases. User searches for a node with title "foo", but the result shown has no title foo.

It would be perfectly acceptable if the data would be just not up2date - that's the usual tradeoff people make to increase performance. But the combination of deprecated + updated data is bad as it leads to wrong and unexpected results.

drunken monkey’s picture

It shouldn't, but have an option to index and apply node-access grants.

I still don't see how that should work. Also, this would have the same problem you are constructing right below that.

So yes, it's an unlikely case as it has to be "published" first, still it might happen. And it will happen much more often in simple non-security relevant cases. User searches for a node with title "foo", but the result shown has no title foo.

It would be perfectly acceptable if the data would be just not up2date - that's the usual tradeoff people make to increase performance. But the combination of deprecated + updated data is bad as it leads to wrong and unexpected results.

Again: If people care about this problem that much, they can just activate "Index immediately" – or use a Solr server with "Retrieve result data from Solr" enabled.
Or, as said, (optionally) calling entity_access() would solve this, to be on the safe side.

fago’s picture

>I still don't see how that should work. Also, this would have the same problem you are constructing right below that.

I know but this issue is about access, so it is relevant. Apache solr module indexes the node access grants as the node access module return them and considers the grants when querying, thus it emulates the db-query node-access systems on solr. It least it worked that way in a d6 version I had a look at some time ago.

@stale-data:
I think that should be at least mentioned in the readme somewhere, though I fear it might drive the "performant" option off not-immediate indexing useless for lots of people. Thus I think it would be valuable for the module to provide a performant, maybe not up2date but consistent option. Anyway, that's off-topic here.

>Or, as said, (optionally) calling entity_access() would solve this, to be on the safe side.
It wouldn't solve the out-of-date problem, but it would avoid security implications.

j0rd’s picture

In the latest version of entity_api I believe $entity->entity_type was changed to $entity->item_type

So you'll need to update this code

function pe_blocks_search_api_query_alter(SearchApiQueryInterface $query) {
  $index = $query->getIndex();
  if ($index->item_type == 'node') {
    if (!user_access('bypass_node_access')) {
      $query->condition('status', 1);
    }
  }
}

I'm also curious why if I set a filter in views to get rid of status != 1 content, why it's still showing up in my faceted search view (even after rebuilding index & clearing cache). The above fix resolved issue for me though.

dawehner’s picture

As drunken monkey pointed out there is absolute no way to do this without alterting the query and getting the pager right.

Sadly many sites don't have just node published/unpublished but real complex node access systems, so just index immediately doesn't help... i know that's perhaps to obvious... just wrote it down.

The apachesolr module has the apachesolr_access module which implements this feature.
The idea is to add additional values to the index which are the realms of the node.

If the query is runned you can get all grants of the current user and filter by them.

Here comes some thought whether it's possible etc.:

Ah search_api_access module seems logical here. Perhaps it could use a generic entity_access module but there seems to be none at the moment. Might be worth to write one. In the meantime it could just use node_access and prepare for a generic one.

It seems to be that a processor is the logical way to add this realms to the item.
Perhaps there
SearchApiAbstractService::indexItem seems to input anything here into the index, not just fieldapi fields/metadata.

The next step would be to alter the query via processor::preprocessSearchQuery. There load the grants and filter by them.
Therefore it would be important to know the entity type which is queried here. Is this possible to get?
Probably yes, because the index is available and this index contains the entity type, at least for non-multi indexes(sorry for this bad name, don't know the right description).

So does it make sense to implement this as processor? Alternative it could be a module which some alter hooks.

If noone else steps forward i would like to do this doing drupalcon.

dawehner’s picture

If this is implemented there could be configuration on search_api_page and search_api_views query backend to disable query access manually. There could be the use case to see the teaser but not the full node. In this cases entity_access() is not a perfect solution, but I'm just thinking of possible options.

dawehner’s picture

Just tracking of work.

drunken monkey’s picture

Thanks for writing this up! All things considered, this probably really is the best way to add the necessary access functionality.

It's still unclear to me, though, how exactly these "grants" or "realms" are retrieved, stored and used for filtering? If the access system can be arbitrarily complex, how do we know what to store, and what to search for?

If this is solved, the rest really should be comparatively easy: probably a data alteration to add the necessary field (or add it by default? of course, we wouldn't have anything to put there for non-nodes, currently …) and then, as you say, a processor or an alter hook. This could (in either case) decide whether to run either based on a query option, or based on the search ID like currently done for facets.

In any case, DrupalCon would seem like a great opportunity to get this in – e.g., in/after the Search API BoF or during the code sprint on Friday.

drunken monkey’s picture

Ah, OK, this now makes more sense. Seems I didn't really understand node access before (if this really catches all cases).
However, as said, you'll need a data alteration to add fields to the index – adding them like this in the processor doesn't work (at least generally). You'll also have to know beforehand which fields you'll add – i.e., this can't depend on the nodes. However, it seems you'll know what realms might show up, beforehand, anyways.

drunken monkey’s picture

Or, you could probably use a single list<string> field and then index things like "node_all_REALM::GID". If that wouldn't have other flaws, it would maybe makes things cleaner on the "Fields" tab, where we'd otherwise have fields for each realm (which the user mustn't set to "not indexed"). (Don't know how many "realms" there normally are, though. As said, I'm embarassingly new to these details of node access.)

Also, in preprocessIndexItems(), $item currently stays the same for all nodes. Just so you don't search for that later.

dawehner’s picture

Thanks for the help!

If the field is required to register having just one additional field is the only possibility as you could have realms per nid (content_access + acl supports this for example).

Also, in preprocessIndexItems(), $item currently stays the same for all nodes. Just so you don't search for that later.

I'm confused about this, isn't the key of the $item the entity id?

It seems to be that the drupal node access system seems really flexible as you can implement it on different query backends.

drunken monkey’s picture

Status: Active » Needs work

If the field is required to register having just one additional field is the only possibility as you could have realms per nid (content_access + acl supports this for example).

OK, then we'll have to do it that way.

I'm confused about this, isn't the key of the $item the entity id?

What I meant was this:

+++ b/includes/processor_node_access.inc
@@ -0,0 +1,117 @@
+      foreach ($items as $nid => &$item) {
+        $nids[$nid] = $nid;
+      }

Here you iterate over all items.

+++ b/includes/processor_node_access.inc
@@ -0,0 +1,117 @@
+      foreach ($nodes as $nid => $node) {

Then you iterate over all nodes, but without setting $item accordingly.
Therefore, the fields you set on $item are always set on the last item, not on the items corresponding to the node you currently look at.

dawehner’s picture

Oh sure, changed this.

Sadly i can't find out how to add a field to $index->options['fields']. What is the proper way to do it?

Still there is no real testing done with this patch.

drunken monkey’s picture

The proper way to add fields is via a data alteration. These have a method, propertyInfo(), where you can return property information (in Entity API format) for new properties added by the data alteration. When a user then enables the data alteration for the index, the new properties/fields will automatically be available on the "Fields" tab.

Either the processor, or a hook, could then be responsible for adding the appropriate filters at search time. In any case, we'd have to check whether the field is really available and indexed on the index. Therefore, maybe a hook would be more appropriate, as this would then automate the process when you have the data alteration enabled and the field set to "indexed". Also having to enable a processor seems kind of redundant, then. Unless, of course, there would also be enough other uses for having that data indexed without activating the processor.

Regarding the code:

+++ b/includes/processor_node_access.inc
@@ -0,0 +1,115 @@
+            if (!isset($item['access_node'])) {

Missed a spot with $item.
Also, maybe something better namespaced, like "search_api_access", would be a better field name. Just to be sure that there are no collissions (I'm always a bit on the paranoid side in such things).

And another thing: Since this is, at the moment, only relevant for nodes, you should implement supportsIndex() (present on both data alterations and processors) and return FALSE for indexes not indexing nodes.

dawehner’s picture


Either the processor, or a hook, could then be responsible for adding the appropriate filters at search time. In any case, we'd have to check whether the field is really available and indexed on the index. Therefore, maybe a hook would be more appropriate, as this would then automate the process when you have the data alteration enabled and the field set to "indexed". Also having to enable a processor seems kind of redundant, then. Unless, of course, there would also be enough other uses for having that data indexed without activating the processor.

I can't think of a use case for a seperate processor, if someone doesn't want access he can disable it on the indexed fields.


Also, maybe something better namespaced, like "search_api_access", would be a better field name. Just to be sure that there are no collissions (I'm always a bit on the paranoid side in such things).

Sure it makes sense to prefix it.

Converted everything to an alter callback and a query alter hook and removed the processor.
See you in london

drunken monkey’s picture

Thanks again for improving on this! This is now really close to be committable, I think.
Some things I changed/added:

- Built in a way to define the user account used for access checks, and a way to bypass access checks for specific search queries.
- Built a way to use the latter of those into the Views integration (as a query setting).
- The "node_access__all" grant was not used in filtering, resulting in filtering out all nodes which are accessible by all users.
- As discussed, also added a condition for status = NODE_PUBLISHED (even taking the "view own unpublished content" permission into account).
- Some Search API internal stuff, and some code formatting according to my liking. ;)

The only thing missing from my perspective right now are a few sentences about the whole thing in the README.

To people caring about this, especially if you have some more complicated node access configurations: Please test this!
Code reviews would of course also be appreciated.

Quick how-to: If you have a node index and enable the "Node access" data alteration (and not set the corresponding "Node access information" field to unindexed), node access should magically start to work (after re-indexing, of course).

drunken monkey’s picture

Status: Needs work » Needs review
bobodrone’s picture

This patch works as described. After activating the data alteration and reindexing the node access system works fine.
I used content access module to control the access and I tried with two different roles to access a bunch of per-node-basis accessed content.

Anyone alse wanna give it a try?

/ bobodrone

dawehner’s picture

Status: Needs review » Needs work
+++ b/contrib/search_api_views/includes/query.incundefined
@@ -93,6 +93,35 @@ class SearchApiViewsQuery extends views_plugin_query {
+    $form['search_api_bypass_access'] = array(
+      '#type' => 'checkbox',
+      '#title' => t('Bypass access checks'),
+      '#description' => t('If the underlying search index has access checks enabled, this option allows to disable them for this view.'),
+      '#default_value' => $this->options['search_api_bypass_access'],

Nice feature!

+++ b/includes/callback_node_access.incundefined
@@ -0,0 +1,93 @@
+        'type' => 'list<token>',

I'm interested in general, what is the difference between a token and a string here?

+++ b/search_api.moduleundefined
@@ -1213,6 +1218,79 @@ function search_api_get_processors() {
+    $query->condition('status', 0);
+    $query->condition('status', 1);

is it sure for all the time that the operator is an AND?

The rest looks fine and is RTBC from my side as it works fine.

drunken monkey’s picture

Status: Needs work » Fixed
Issue tags: +API change

I'm interested in general, what is the difference between a token and a string here?

Nothing, really, it's just different nomenclature. I introduced the "string" type to differentiate from fulltext fields – then, fago came and called his type "token" in the Entity API. Therefore, it's now called "string" in Search API and "token" in the Entity API. The data specified in the propertyInfo() method gets handed to a Entity API wrapper, therefore the Entity API type has to be used.
Confusing, I know (I actually had this wrong myself in another spot), but blame fago! ;)

is it sure for all the time that the operator is an AND?

Yes, the outermost filter is always sure to have the AND operator, like it is for DB select queries.

Thanks for reviewing again so thoroughly!

I now included the following in README.txt (as the first paragraph of „Information for users“):

IMPORTANT: Access checks
  In general, the Search API doesn't contain any access checks for search
  results. It is your responsibility to ensure that only accessible search
  results are displayed – either by only indexing such items, or by filtering
  appropriately at search time.
  However, this doesn't apply to searches on general site content (item type
  "Node") – for these, correct entity access checks are implemented, so you
  don't have to take care of that in any way yourself. Permissions on field
  level aren't checked in any case, however. Also note that some search types
  (e.g., search views) might provide the option to disable this access feature
  for individual searches.

I guess that about does it. I'll also include a similar note on the project page.
Can anyone think of any corrections, clarifications or additions to that?

Anyways, I just committed this whole thing for now. Thanks again for helping out with this, Daniel!

Even if it is just a small API addition (the additional recognized option in the query class), I guess this qualifies as an API change, so tagging accordingly.

drunken monkey’s picture

Status: Fixed » Needs work

Sorry, this is of course wrong, users have to enable the "Node access" data alteration for that to work …
Will commit a correction in the next few minutes.

drunken monkey’s picture

OK, new version of this section:

IMPORTANT: Access checks
  In general, the Search API doesn't contain any access checks for search
  results. It is your responsibility to ensure that only accessible search
  results are displayed – either by only indexing such items, or by filtering
  appropriately at search time.
  For search on general site content (item type "Node"), this is already
  supported by the Search API. To enable this, go to the index's "Workflow" tab
  and activate the "Node access" data alteration. This will add the necessary
  field, "Node access information", to the index (which you have to leave as
  "indexed"). If this field is present, access checks will automatically be
  executed at search time, showing only those results that a user can view. Some
  search types (e.g., search views) also provide the option to disable these
  access checks for individual searches.
  Please note, however, that these access checks use the indexed data, while
  usually the current data is displayed to users. Therefore, users might still
  see inappropriate content as long as items aren't indexed in their latest
  state. If you can't allow this for your site, please use the index's "Index
  immediately" feature (explained below) or possibly custom solutions for
  specific search types, if available.

This now also mentions the "stale data" problem, and the entity_access() solution that we might want to additionally implement for search views.

Incidentally, attached is a patch that would do this (add an option to execute entity_access() on result entities to search views). Testing, reviews and opinions very welcome!

travis-cunningham’s picture

I ran patch 44 and 50 but I am not seeing the field "Node access information" and I am not seeing the Node Access Filter in the Workflow.

This is while looking at an existing Node Index and I have flushed cache, disabled and re-enabled the module as well as adding some test text to one of the other filter descriptions to confirm the changes are being applied.

Also, I am current on Entity, Search API, and Core.

Am I missing a step/patch? Is there anything else I can do to debug this?

drunken monkey’s picture

Status: Needs work » Fixed

This has already been committed, and I have no idea why you shouldn't see this option.
Is the code right, i.e., does it return a „Node access“ data alteration in search_api_search_api_alter_callback_info()? Then you could debug the functions called in the data alteration, in includes/callback_node_access.inc.

Moving the remaining Views proposition to a new issue, #1278942: Add an option to apply entity_access() to Views results, and closing this one (to avoid confusion about the tags).

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

travis-cunningham’s picture

Just wanted to followup on this. I was able to get the latest DEV version of the Search API (Entity and Search Pages) to work but I wanted to note that if you have an existing index it needs to be cleared prior to adding the new module. Failure to have a cleared index will result in several PHP exception errors (whitescreen) that will prevent the update.php script from running (along with almost everything else).

If this does happen to you just need to rollback the modules, clear the index/disable it, and then drop the new modules back in place. Run the update, re-index, and back to normal.

Thanks again for the notes on debugging the data alteration in the callback_node_access.inc.