Hello,

I am trying this module, and managed to make it work. But now I have a problem:

I have run the cron several times. I checked the number of items per cron, etc. But after some crons, the number of un-indexed items remain the same. And the search cannot find some of the nodes that I created recently. In my situtation the number of un-indexed pages starts with 790, after some cron runs it drops to 290. But stuck at this number.

I am attaching the settings of SBP and the screenshot of the search settings page.

Any help will be appreciated. Thanks...

CommentFileSizeAuthor
search_settings.png4.63 KBSinan Erdem
SBP_settings.png221.22 KBSinan Erdem

Comments

jhodgdon’s picture

Status: Active » Postponed (maintainer needs more info)

Is it possible that you have some nodes that are causing problems in search indexing? For instance, if one node has PHP code in it, or for some reason takes a very long time to index, perhaps cron is running out of time and breaking, and then never getting past this node.

Maybe you could try on the SBP Settings page, disabling one content type at a time, until you find the one that is causing the problem? That might help. You might also try temporarily disabling the SBP Attachments module and see if that is causing the problem.

Once it is narrowed down to one content type or attachment type that is a problem, then we can look further and see why. Hopefully.

Sinan Erdem’s picture

Thank you for your attention. Will try as you say...

Sinan Erdem’s picture

I have 10 different content types. Each have 50 - 100 content aprox.

What I did is:

1. Cleared all indexing from search settings page.
2. Cleared all the content types from search by page settings.
3. Added only one content type.
4. Run cron.
5. Tried to search a keyword: Content type is indexed correctly.
6. Cleared indexing again and went to step 3 (for trying another content type)

I did these steps for every content type and saw that every one of them was indexed correctly.

7. Then this time after clearing indexing, I added all the content types to the search by page settings.
8. Run cron many many times.
9. Tried some keywords to see some of the content is indexed correctly, some of them didnt index at all.

NOTE: I dont activate search_attachments or search_paths modules, just "Search by Page Nodes" module.

jhodgdon’s picture

That is very strange. I don't know why your content would be indexed when it is alone as the only content type, but not when you have multiple content types.

Maybe you can try changing the setting for how many Search by Page pages get indexed in each cron run -- that might help?

Also, when you see that only some of the content is indexed, did you verify on the Search module's settings page that it says Search by Page is fully indexed?

Let me know if either of these helps... and then I can also look at the code and see if I can figure anything else out.

Sinan Erdem’s picture

I really cannot solve the problem. In search module's settings page, after some cron runs, the number of un-indexed pages by "search by page" stays same: 412 or something.

The only thing which is not clear for me is "Minimum reindexing time" and "Maximum reindexing time" I set these as 1 second for each. Are they realted? I run the cron by hand to try and I have relatively fast server with 1 gb memory etc...

Sinan Erdem’s picture

Ahh.. I think I have made a stupid mistake. When I set "Minimum reindexing time" as 1 second, system tries to reindex each indexed page again when cron runs. So some of the content can never be indexed because there are more content than the maximum content allowed to be indexed per cron run.

I think all works ok. Sorry for taking your time with this stupid mistake of me.

But I must say those settings about indexing time is a bit confusing. Maybe there should be a warning to avoid such mistakes.

Anyway, thank you for your help and time...

jhodgdon’s picture

Status: Postponed (maintainer needs more info) » Fixed

Sorry for the confusion! I think the README.txt file has some information on the settings. But still, Search by Page should be indexing content that has never been indexed, in preference for indexing content that is due for a new reindex... but anyway if it is working for you, I guess I won't worry too much!

And if you have suggestions for how the documentation and/or on-screen information about those settings could be improved, please let me know.

Sinan Erdem’s picture

Maybe just adding a warning note under these settings:

"Warning: After this period of time, all indexed items will be re-indexed. Please make sure you have enough number of cron runs to re-index all your indexed content, or some of your un-indexed content may not be indexed at all."

jhodgdon’s picture

Category: support » bug
Status: Fixed » Active

SBP is supposed to index never-indexed pages before it goes on to time-to-reindex pages. I will look into this further...

jhodgdon’s picture

Title: Doesn't index some of the pages. » Settings can cause content not to be indexed, so add warnings
Component: Main Search by Page module » Documentation
Status: Active » Fixed

Oh, I see. This *is* just a problem with the settings. I will add warnings on the settings page and in the README file.

I've updated both 6.x-1.x-dev and 7.x-1.x-dev with these changes. At least, I've committed the changes. They should be in the dev releases within 24 hours.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.