I have nodes on my site that are only meant to be seen in a view, not viewed directly. I've therefore setup this module to not index those nodes type, and to instead index the views page these nodes are displayed on.

However, I'm come across a problem where, because my view is paginated, it can't find nodes that aren't on the first page of the view (because the path is different for every other page). My view page has the path 'multimedia' (so this is what I added to this module's path settings), but every other page has the path 'multimedia?page=[n]'...

Is this a bug or am I not doing something properly? I tried changing the path in the search settings to 'multimedia*', but that stopped all nodes on that page being displayed.

Comments

jhodgdon’s picture

Category: support » bug

You are not doing anything wrong... I've actually had this question in the back of my mind for some time, but I didn't know what to do about it. Thanks for bringing it to the top of my attention -- having a user with a real need is always good motivation. :)

So, for the moment, if the view has only a few pages, you can add each one as a separate path in Search by Page Paths. That may help you.

Going forward, I will have to figure out what to do. One option is to add a new "pager variable" field to a SBP Paths item, so you can put in the word 'page' (for example) to indicate that Search by Page should try to index ?page=1, ?page=2, etc. after the end of the path you specified. But it's still possible that between when you index and when you search, some particular piece of content could end up on a different page, so you might get a search result that sends you to page 4 of the view, when the content you were searching for is really now on page 5 because some new content has been added.

That aside, I think the pager variable field is probably the best option, to keep it fairly automated. Any thoughts?

BWPanda’s picture

I was originally thinking the wildcard option would be best (e.g. multimedia*), but then realised that the nodes I don't want shown have the path multimedia/[node-title], so for my situation at least, that wouldn't work.

I'd like to see this as an automatic feature (since most people that have paginated views would want all pages indexed), that can be optionally turned off. As far as I know, '?page=[n]' is the only syntax Views uses for pagination, so it's safe to say this will be the same for all sites. I therefore recommend a checkbox (checked by default) that says this page is paginated and that means all '?page=[n]' paths will be indexed in addition to the given path.

As for the difference between indexed and actual results pages, this will really only be an issue on highly active/dynamic sites, which would generally have quicker cron runs anyway. Aditionally, Google still has this problem (sometimes the text I search for is on a different page to the one they send me to), so I wouldn't worry about this so much - if they can't work out how to fix this, no point you beating yourself over the head about it too :)

BWPanda’s picture

Might be a separate issue, but when I add 'multimedia?page=1' as my path, I get the following error in the log:

content for PID (30), path (multimedia?page=1), realpath (multimedia?page=1) was not indexed (2)
jhodgdon’s picture

I will look into that error, and into adding the ability to search paginated views to the module.

Thanks for taking the time to report the errors...

jhodgdon’s picture

RE: #3 -- that's a "not found" error. So I'll need to look into why that path didn't work.

jhodgdon’s picture

RE: #3 - I have fixed this error in the 6.x-1.x-dev version of Search by Page, so you can at least enter paths with ?page=1 etc. in them in Search by Page Paths, and they will be searchable. When the next 6.x-1.x-dev release comes out (check the date - should be Feb 9 or later), that fix will be included.

I'm leaving this issue open, however, since I haven't yet solved the paginated views issue fully.

jhodgdon’s picture

Category: bug » feature

I guess what's remaining to be resolved is a feature request.

Lonely_cowboy’s picture

I'd like to thank you for this great module and to support the feature request for generalized path search index environments. My current issue would most probably be identical as already requested, but just to illustrate and emphasize:
I just have a view that shows a table overview under node/16. Each row is linked to another view showing much more details under node/16/detail/1 etc. Now I have set each row into one path declaration for Search by Page settings, but it would be nice to use node/16/detail/[n] or node/16/detail/% for indexing. In my case this variable corresponds to the [uid], since I'm providing extended profile search by puting core profile content grouped by CCK fieldgroups into the view displayed as I need them.
Although the content profile module would provide easier access (= node content) to be indexed by the core search module, it turned out to be much more difficult to integrate smoothly in my user profile interface and it would only show splitted profile field group search results (since everyone became a single node), so I dropped this again and go in favour of Search by Page.

jhodgdon’s picture

Just trying to understand here -- this doesn't really sound like pagination to me, but quite a different use case. In pagination, it would be possible for Search by Page to actually figure out (from Drupal internals) how many pages are defined during search indexing, and index each page. And it would need to be dynamic, because presumably as content/comments/etc. are added to the system, the number of pages would change. Also, the URLs for pagination are generally (base page URL)/?page=N

But this looks like something different -- a set of pages with similar URLs with a numeric suffix?

jhodgdon’s picture

OK, so here's a summary of what I think needs to be done for this feature request.

Main use case:
- Using Views or some other module, you have a set of pages using Drupal's standard paging mechanism, with paths "foo?page=#", where # ranges from 1, 2, 3, ... You want Search by Page to index each of the pages.

How to implement:
- Assume that Drupal's standard paging mechanism is being used.
- In the Search by Page Paths setup page, add fields for "pager index" (like what Views lets you do if the pager is not working, you can override the pager index), and "pager URL fragment" (where you would enter "page" to have URLs like foo?page=# and "blah" if your URLs are foo?blah=#).
- When indexing, figure out how many pages there are from the Drupal pager variables, and index page foo, as well as foo?page=N for all existing pages.

Lonely_cowboy’s picture

Sorry if I wasn't clear enough earlier (comment #8).
I believe you got the point correctly and your implementation should work.
I have a view of type page at url foo?q=node/16 showing a set of table rows containing urls of a second view using the [uid] as #.
The views urls currently may be missdefined as foo?q=node/16/detail/# not using a = before #, but I could change this naming scheme to e.g. foo?q=node/16&detail=#. This should comply to query standards and to your foo?bla=#, since blah "q=node/16&detail" does not alter for different #.

Drave Robber’s picture

Has there been any progress on this?

This feature could be immensely useful for searching Aggregator pages – afaik there's no other way to do that, apart from installing Solr, which is not always possible.

jhodgdon’s picture

Sorry, I haven't made any progress on this yet. I've been pretty busy lately (we're trying to get Drupal 7 out the door, and I'm one of the core developers, as well as being the co-lead of the Documentation team). But hopefully things will be a bit quieter once Drupal 7 is released, and this is probably my highest-priority feature to add.

Drave Robber’s picture

Thank you for the quick reply;

when something comes out, I'm ready to serve as a tester for it (at the moment, I have ~170 pages of aggregated outside feeds on my site, and will probably have 300...400 pages by Feb :)

Triskelion’s picture

I have a dirty work around for this issue. I appended ?items_per_page=All to the end of the view path so the indexing would use the full view results. I have set the indexing throttle to 10 per cron run in case the views take a long time to index.

jhodgdon’s picture

Version: 6.x-1.7 » 7.x-1.x-dev
Issue summary: View changes
mibfire’s picture

Any update on this?:)

jhodgdon’s picture

Sorry, no. I have not been making time for any of my contributed modules lately, since I've been busy working on Drupal 8 Core and the new User Guide project for Drupal 8. In fact, this module is looking for a new maintainer... hopefully someone with more time.