Hi,

if the default language of a drupal installation is e.g. english, and you add sth. like

=en=
Hello
=de=
Hallo
=qq=

in a body, the string "Hallo" will not be found when searching through the site.

As far as I see that's because drupal builds the node body for the search index by

  $node->build_mode = NODE_BUILD_SEARCH_INDEX;
  $node = node_build_content($node, FALSE, FALSE);
...

and in node_build_content the filters are applied via check_markup and so all content for the non-default language gets lost.

I'm thinking about a solution, maybe it is possible to use the nodeapi_hook in language_sections to reload the node body if node->build_mode is NODE_BUILD_SEARCH_INDEX, and apply only all but the language sections filter. But I'm not yet sure.

Has anyone ever thought about this problem? Any ideas or proposals how we could get around this?

Comments

Andy Inman’s picture

Category: bug » feature

I've changed this to "feature request" as I don't really think counts as a bug in LS as such.

We would need to understand better how the built-in search facility works in a multi-language environment. I for one don't claim to understand it. At a guess, LS might be able to detect that the cron indexing process is running, and so behave differently, presumably presenting *all* text *all languages) for indexing. Need input here from folks who understand how that stuff (the indexing) works, I don't have time to go investigating, sorry.

Frank Steiner’s picture

  • With multi-language support every page in every language is indexed. And you find any page no matter what language you are currently using when viewing the search page. I.e. you find english pages when searching for english words even if your current language is german.
    When you click on the search result you will be switched to the right language.
  • WIth i18n things are different as you can chose which content you want to see depending on the current language. If you select e.g. "Current language and no language" and you current language is german, you will not find any content of english pages as i18n rewrites all the queries to match the current language (or language-neutral) when selecting nodes.
  • As far as I understand the code, drupal doesn't tag anything with a language in the search index.

    For language sections this will mean that we would find pages with english content when searching for it, but when clicking on the result, we would see the german content if german was our current language.
    I'm not sure if we can manipulate search results to switch to the language in which the string was found (we could figure out that). We have hooks but I'm not sure how to fetch the correct URL information as there can be different situations (language prefix in the path or in the domain etc.). I feel this should be doable.

  • To get all languages indexed we need just one little change:
    function language_sections_filter($op, $delta = 0, $format = -1, $text = '') {
      ...
      switch ($op) {
        case 'process':
          if (request_uri() == '/cron.php') return $text;
    

    Using the request_uri was the only way I found to figure out that we are demanded from the cron script. Everything else (cron_semaphore etc.) would conflict with normal page loading, especially if a cron gets stuck.
    We could also use nodeapi('update index'). But that would match only nodes, and it would be more complicated to figure out which parts to add due to other filters. We could e.g. remove the separators and call check_markup again so that php code etc. is removed.
    Just checking for the request uri works for all type of text, not only nodes and is considerably simpler.

  • We would still need to hook into manually called indexing (e.g. the biblio module indexes pages when inserting/updating them). Here I'm not sure how to do this efficiently. We can't hook into search_index but any module could call this. We could capture nodeapi by using some global flag or sth. but I think we won't find a clean solution. So we might have to accept that multilanguage content is only indexed during cron runs.

The cron.php patch is working fine for my site. I could try to alter the search result links for changing to the language of the result, but first I'd like to hear your opinion on all this.

stratosgear’s picture

subscribing

Andy Inman’s picture

Status: Active » Needs work

I've finally got round to going through the issue queue :) Thanks to Frank for that info and thought. It occurs to me that simply testing that cron.php is running might have unwanted side effects, i.e. some code could be running (other than search indexing) which needs to see the language-specific text. So, if this feature were implemented, it probably needs to be a configurable option.

It seems to me that using hook_nodeapi and checking for $op == "update index" may provide a solution: "The node is being indexed. If you want additional information to be indexed which is not already visible through nodeapi "view", then you should return it here." As Frank points out, there are limitations and potential problems, but the advantage is that it should work with any type of search module, not just the standard Drupal search. Further input welcome.

Frank Steiner’s picture

I see the problems with the cron check. Surely the nodeapi way is better, I'm just not that sure how to build the node body. I guess we would need to refetch the whole body and apply all filters usually applied, except for the language sections filter, would that be right?

Andy Inman’s picture

Applying filters is normally done by node_prepare - I suppose that node_prepare is already called during the indexing process, before the actual index updating is done - that must be the case otherwise the indexing would never see filtered text. So I think all we need to do is:

1. Use hook_nodeapi $op == "update index" to set a flag for LS.
2. In LS standard processing (will be called by node_prepare), check for the flag and if it is set do no filtering.
3. Clear the flag.

Easy!?

Frank Steiner’s picture

I'm not sure it's that easy because of the following code in _node_index_node:

  // Build the node body.
  $node->build_mode = NODE_BUILD_SEARCH_INDEX;
  $node = node_build_content($node, FALSE, FALSE);
  $node->body = drupal_render($node->content);

  $text = '<h1>'. check_plain($node->title) .'</h1>'. $node->body;

  // Fetch extra data normally not visible
  $extra = node_invoke_nodeapi($node, 'update index');

Thus, node_prepare will be called (from node_build_content) before nodeapi('update index'), so we cannot set the flag.

Andy Inman’s picture

Anybody want to try and make a patch?

Frank Steiner’s picture

I thought about this a long time but I don't see any clean way to handle this stuff with the flag. The other apporach, i.e. re-fetching the body and applying all filters but language sections seems to complicated to me because of roles. I wouldn't know which filters to apply as there is no "standard" role.
That's why I ended up with the cron solution: I just couldn't do any better :-) So I'm stepping back :-)

Andy Inman’s picture

Well, thanks for trying! From your #7 it seems that testing for $node->build_mode == NODE_BUILD_SEARCH_INDEX might work and would not have the unwanted side-effects that testing for a cron run could have.

Jurgen8en’s picture

Hi,

For Drupal 5.x, looks like the function language_sections_filter is never called through cron.
There is also no build mode or ...

In my situation I only use language sections for one node-type: uc_product

I added

function uc_product_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) {
  if ($op == 'update index') {
	  return db_result(db_query('SELECT body FROM {node_revisions} WHERE nid = %d', $node->nid));
     //return '<strong>('. implode(', ', $output) .')</strong>';
  }
}

Run cron, updated a node. Searched, but it doesn´t work.
Any help is welcome..

Jurgen
www.cardKeyfinder.com

Frank Steiner’s picture

> From your #7 it seems that testing for $node->build_mode == NODE_BUILD_SEARCH_INDEX
> might work and would not have the unwanted side-effects that testing for a cron run could have.

I'm afraid it won't, because node_build_content calls node_prepare which calls check_markup before we can intervent with any nodeapi function or sth. similar.

Andy Inman’s picture

Status: Needs work » Closed (works as designed)

Happy 2010 to all!

I've taken another look through this thread, and re-read various Drupal docs trying to find a simple solution - I don't think there is one. Any solution would involve hooks and processing that would not normally be performed by an input filter. So, I've decided to close this issue, along with a couple of others that are related to using LS with nodes. My reasoning is, LS is an input filter, it's not necessarily dealing with node bodies. So, in order to keep LS short and sweet, any additional functionality needed specifically to handle nodes should be a separate module. The same goes for http://drupal.org/node/313770 which is about translation of node title and http://drupal.org/node/312736 which is about menu translation.

In summary, LS is an input filter and should stay that way. I know that broader functionality needed to handle many situations in multi-language sites, and I think that needs to be provided with application-specific modules, keeping LS as a generic filter. Such modules can then call LS to do text filtering, and we keep things nice and modular.

Andy Inman’s picture

Status: Closed (works as designed) » Closed (fixed)
Andy Inman’s picture

Status: Closed (fixed) » Active

I'm re-opening this. Now that I have LS Extras as an "add-on" to LS, maybe I can build a solution as another LS Extras module. I will take a look, and others are welcome to do the same! So, really this should now be in the LS Extras issue queue, but there's no way to do that.

Andy Inman’s picture

Status: Active » Needs review

Ok, now there is a new module called LS Search as part of LS Extras - you will need the current dev release of both LS and LS Extras to use it. It simply appends the entire body of the node to the output used for search indexing. So that means you will get the site-default language section twice - not ideal, but this is a starting point - if it seems to working practice then I guess the next step is to prevent the site-default language section from getting duplicated. The extra processing only happens during _node_index_node processing, so should not cause side-effects in standard node display.

Dev releases should be visible within the next 12 hours - or grab them from CVS.

I await your feedback/patches!

EDIT: Now updated see below.

Andy Inman’s picture

Changed it to work a different way. This one seems to work fine under my tests (no duplicated content now.) Not sure if you may need to fiddle with system table weight for the module - if it doesn't work, try setting weight higher or lower.

If using LS Node Titles, that seems to work properly for search too.

I am not sure if comments will be properly handled if they also use LS - is anybody doing that?

Andy Inman’s picture

It's very quiet in here - Is anybody following this?

Frank Steiner’s picture

It's on my watch list but our drupal installation is frozen as long as we don't have a new development slot. I will take a look at this definitely, but not in the next 3 or 4 weeks...

Andy Inman’s picture

Hi Frank, thanks for the update.

Andy Inman’s picture

Someone please test this! It works ok on my configuration, but given that there are many possible configurations (languages, search modules etc.) I would prefer to get some feedback before adding it into the next release. To test, get latest dev versions of LS Extras and Language Sections, and activate the LS Search sub-module.

Frank Steiner’s picture

I'm not able to setup a clone test server for the moment, and for the production server we are not supposed to fiddle with -dev or beta versions. So I can't promise a test for now. Should you just add it to a stable release number, we will automatically update to it and then I can easily test on the production server and report back :-)

Andy Inman’s picture

So they trust me more than they trust you!? :) Not quite so easy as I'd need to do a new release of Language Sections too - I don't really want to do that until this is tested or I have some other features or fixes to add.

Anyway, no rush - it was you who first raised the issue! :)

Andy Inman’s picture

Happy 2011 (polite reminder :)

mertskeli’s picture

netgenius, thank you for the module.
The idea is really brilliant.

I have tested LS Search.
Unfortunately, it doesn't work.
1. There's an error in
$result = db_query_range('SELECT n.nid, n.format, c.last_comment_timestamp FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid = c.nid WHERE n.status = 1 AND n.moderate = 0 AND (n.created > %d OR n.changed > %d OR c.last_comment_timestamp > %d) ORDER BY GREATEST(n.created, n.changed, c.last_comment_timestamp) ASC', $last, $last, $last, 0, $limit);
Wrong query for n.format and c.last_comment_timestamp
2. Only default language is being indexed.
3. Indexing is not being performed at all if a body contains <!--break--> for teaser separation. For that, there's also, a kind of a support request:
Let's say we have:

=en=
<!--break->
English Text
=de=
Title: Deutsch Title
<!--break->
Deutsch Text

where <!--break--> is placed to prevent teaser generation,
in such a case, =en= becomes a teaser.
Well, you can insert extra <!--break--> in the very beginning (before =en=), but that's rather strange.
Also, in =de= section what should go first: Title: Deutsch Title or <!--break--> ?

vasrush’s picture

Your modules seems to be a great solution for creating multilingual sites,
but unfortunately the indexing part is pretty major for any kind of website.
I've tried LS Search with no luck too.

Can you please update the latest development links in ls_extras page, cause the old CVS
links are dead. I would like to test the latest development version.

Thanks in advance

Andy Inman’s picture

Re. the broken links ... use the Show all releases from the project page.

As it says on the project page, LS was never intended to as a complete solution for multi-language content - just a way of handling perhaps a few pages or blocks, Views header/footer, etc. I'll take another look at the Search issue - I had it working under test, so not sure why it's not working for others. But really if you want searchable multi-language content, the obvious way is to use standard Drupal node translation. I am not clear as to why anyone would not want to do that.

mertskeli’s picture

...if you want searchable multi-language content, the obvious way is to use standard Drupal node translation. I am not clear as to why anyone would not want to do that.

And for a non-node custom module? LS is really very convenient for that. Besides, node translation means 2 different pages, thus 2 different paths and other consequences.

Also, it is not quite clear how the LS search works. Imagine a field with several languages, both having similar words, e.g. "Buy Panasonic XX-123" and "Comprar Panasonic XX-456". If an english user searches for "XX-456", will a page with this field be shown to him?

vasrush’s picture

Thanks for your response,

Re.I am not clear as to why anyone would not want to do that....

I build some multilingual sites using the standard Drupal node translation with no problem, but when it comes to build an Ubercart multilingual site, your solution is much easier to accomplish (one node for a product regardless languages). Saves you time of patching and patching in order to synchronize stock, attribute and other translate problems.

Andy Inman’s picture

Thanks mertskeli and vasrush - so, both using LS primarily with Ubercart? I don't know enough about UC - maybe a UC-specific module is what's really needed? The issue is that a product needs to be a single node (for pricing, stock, etc.) but the product description needs to be translated to different languages. Is there more too it than that?

mertskeli’s picture

I'm using (plan to use, testing for now) LS with a custom e-commerce module, not UC.
Any content type (including the one created upon installation by UC called "product") is just a set of custom fields. Each of the fields can be passed through check_markup with all the benefits of LS filtering. It means that every field can contain all the language deviations, including stock and price specific fields. It is a very serious advantage of LS.
Imagine such a schema:
price_en ...
price_de ...
price_fr ...
stock_en ...
stock_de ...
stock_fr ...
Instead of all these numerous fields you can have just single price and stock fields and add as much languages with LS as you like.
For the prices you can even have different prices per countries.

But it is not only about e-commerce. The ability to have multilingual content within a single node (or a custom content type) which switches per language negotiation moves Drupal to another height.

Still, LS search is a very serious issue, preventing from professional usage of LS.

Core content translation is a suitable solution for blogs, lets say, but for a website with thousands of pages and dozens of fields is almost an impossible solution. If you use just default node/123 addressing you can bear it, but as soon as you make meaningful paths it becomes a horror.

vasrush’s picture

No need for UC-based module or something else. Language section is working great.
Really great solution that solves all my translation problems.
The only downside is LS Search.
It makes your site unsearchable that prevents me from using this great module at live sites.

Andy Inman’s picture

Thanks to both mertskeli and vasrush, now I understand!

Also, re. #28 ...

Also, it is not quite clear how the LS search works. Imagine a field with several languages, both having similar words, e.g. "Buy Panasonic XX-123" and "Comprar Panasonic XX-456". If an english user searches for "XX-456", will a page with this field be shown to him?

It works by disabling LS filtering while updating the search index, so the entire content (all languages) will be indexed for that node. So yes, in your example, XX-456 should be found. I think that's the only feasible approach, and I think it will work fine in practice.

I'll take a look later as to why LS search isn't working.

Re. #25 .. Instead of...

=en=
<!--break->
English Text
=de=
Title: Deutsch Title
<!--break->
Deutsch Text

try ...

=en=
English Text (teaser)
=de=
Title: Deutsch Title
Deutsch Text (teaser)
=all=
<!--break->
=en=
English Text (after the teaser)
=de=
Deutsch Text (after the teaser)

... I think that will work.

Frank Steiner’s picture

Hi,
back after a long time ;-) I installed ls_extras 6.x-1.x-dev and language_sections 6.x-2.x-dev, is that as it should be?
After installation and running update.php I activated the "Language Sections Search" module (and only this, no other module from LS Extras). Now, at every cron run (I invoke it manually to get some new test pages indexed) I get the following error:

user warning: Unknown column 'n.format' in 'field list' query: SELECT n.nid, n.format, c.last_comment_timestamp FROM node n LEFT JOIN node_comment_statistics c ON n.nid = c.nid WHERE n.status = 1 AND n.moderate = 0 AND (n.created > 0 OR n.changed > 0 OR c.last_comment_timestamp > 0) ORDER BY GREATEST(n.created, n.changed, c.last_comment_timestamp) ASC LIMIT 0, 20 in /usr/share/drupal/modules/ls_extras/ls_search/ls_search.module on line 36.

Also, the LS Search has no effect. I created a new page with

=de=Coca Cola
=en=Pepsi Cola
=qq=

After a manual cron run, the page is found when searching for "Coca", but not when searching for "Pepsi". I.e, just as before :-(

Andy Inman’s picture

Welcome back Frank :)

Thanks for the info. Yes, I think the versions you have installed are correct - it seems LS Search simply doesn't work right now. I will take another look at it when I get a moment.

mertskeli’s picture

netgenius, thank you for the hints.

So yes, in your example, XX-456 should be found. I think that's the only feasible approach, and I think it will work fine in practice.

Probably it is not so good...
"Buy Panasonic XX-123" is within the English section only. So while searching for "XX-456", the result will be referring to this node, and while opening it the English speaking user will not see any XX-456 in the text, only XX-123.
I assume the search for English should be done only within the English section, and "XX-456" search string should return no result.

Frank Steiner’s picture

With the ls_extras-6.x-1.x-dev version from 2011-May-29 it works for me :-) After a cron run I find the same page no matter if I search for a term only in the =de= or in the =en= section :-) Nice!

The problem mertskeli refers to is indeed not trivial. It is very confusing when you are at german language, search for an english term, get a result and after clicking on it, you don't find the english term on the page. But I think this can't be solved because drupal doesn't know about different languages in the indexed version of the page. For drupal, it is just one page with words in different languages. The search module doesn't know about the language sections tags, so how should it know that it should ignore some part of the indexed content?

What might work is a hint in bold and red on the search result page, stating that the result might not exist in the default language and that people should switch to the language of the search term (either before searching or when looking at the results).
I'm not sure if one can hook into the search result page to change the output with such a hint?

Andy Inman’s picture

@Frank - thanks for trying the 2011-May-29 version - I took a different approach and it does seem to work well.

Note:

* A title set with LS Titles will not be indexed (this will be fairly easy to add.)
* Comments aren't handled - if comments include LS sections, only the default language part would be indexed.
* It does not currently need a dev version of LS - the 2.5 release should work.

@mertskeli (and Frank) - Regarding the problem of "wrong language" words in search results - I agree it *is* a problem. I don't speak German, but as an example the word "once" in Spanish means "eleven", and "sin" in Spanish means "without" - search results could be very confusing! I'll do some more research, but I think this will be difficult and to solve. The underlying problem is that the entire search subsystem is not really language-aware. There's some related discussion here: http://drupal.org/node/316147

mertskeli’s picture

Have to agree that the Drupal's search is the one to blame.
It would be nice if LS Search could cope with it somehow. Probably a search module from scratch?

I also tried to create a custom content type (not a node type), with dedicated fields per language (like 'body_en', 'body_de', etc). While such an approach has its advantages, and allows not to use LS at all, it still shows all the drawbacks of the default core search. So I switched back to LS, and made up my mind that it would better to sacrifice searching (for now). Hope netgenius will be genius enough to solve it :)

Andy Inman’s picture

I've now made release 1.10 which includes an updated LS Search module supporting titles set with LS Titles.

"Wrong language" search results

I've researched this further and conclude that there is no simple solution. Possibly a viable solution would be to use a theme override function to selectively remove results from the search page, where those results do not include the requested search terms for the current language. This would involve re-searching the results after applying filtering to the node body.

Reminder: LS Search processes the node body. CCK text fields will not be included in the search index (except for the site default language.)

dimitriseng’s picture

@ netgenius - Thank you very much for this great module. The same as others mentioned in this issue, I am trying to build a site using Ubercart and the LS and LS Extras modules seem like a better approach than the standard Drupal and i18n approach, as they does not work very well with Ubercart (at the moment at least).

I am using the latest versions of Drupal, LS and LS Extras and all seems to be working great, apart from the LS Search, which only returns the content in the default language (I am using 2 languages). From what I can gather from the comments above, this has not been resolved yet, let me know otherwise. I don't mind if the search results return results only from the current language or both, as long as results are being returned at least for the content in the user's current language. I hope that you or somebody else will manage to find a solution for that as otherwise it could be a problem to use in production sites. Thanks again! :)

dimitriseng’s picture

@ netgenius - As per #41, did you have the time to investigate further the LS Search issue? This is the only blocking issue for me in order to use LS in a production site. Many thanks for your great work.

Andy Inman’s picture

@dimitriseng are you using v1.10 of ls_extras ? The included version of ls_search should handle content indifferent languages correctly. Note, it (currently) only deals with node bodies - it does not process any cck text fields.