I was just thinking about the issue of RES->M in the XML being a massive lie in many cases (quite aside from the 1,000 result maximum, I've seen this value change dramatically between pages... from something like 800 to 40 in one extreme case. And this is with filter=0)

The module behaves as well as can be expected given the mis-information -- if it thinks there are 30 pages, and you go to the 'last' page, you may well find yourself on page 8 of 8. There are no errors or empty pages; just a 're-evaluation' of the number of pages, so it's not too bad.

Still, a possible workaround occurred to me. It's ugly, but...

In the session or database cache (with an appropriately expiry), remember the number of results for a given query (keyed by a hash of the query parameters excluding 'page').

After executing a search, check that cache to see if we know how many results there are. If we don't know, then do a second query to Google for what should be the final page of results, based on the data it gave us. With any luck, that second query will have a more accurate value for the total number of results, which we can then cache.

We then use the cached value to generate the pager, etc.

This feature would need to be optional, and off by default.

If the user is anonymous, and has cookies disabled, the session will be useless. Be sure to avoid doing an extra query on every search request in this instance.

That's pretty gross, but it might be worth considering.

Comments

meba’s picture

I think I am fine with the solution we have now - thing is that nobody clicks to page 10 :-) What do you think?

jweowu’s picture

I agree that the current behaviour is fine in most cases, and should remain the default behaviour for the module (as this workaround would make searching less efficient, and Google may even resolve the problem some day).

If you do spot the effect, though, it absolutely looks like a bug. Especially if you want to print out the total results value on your search results page. So I think there's probably some value in making a workaround available.

I really just wanted to note down my thoughts, in case someone else felt motivated to implement the idea.

HenryLTV’s picture

Hey meba & jweowu,

First off, wonderful work on this module so far! Came at just the right time for us fortunately! We are planning a switch from Apache SOLR to GSS and I've been playing around with your module locally.

So I gather from this thread that this bug is on Google's side? I too noticed the discrepancy of the page numbers when you start going high, esp. when " last >> " is clicked. For example: the initial search results would list 14 pages, but after clicking " last >>", the querystring would show "?page=13" but the paging would be on "Page 10".

On a smaller side note about the paging, would it be more consistent to have "Page 2" show "?page=2" in the querystring instead of "?page=1"?

Again, keep up the great work!
HenryLTV

jweowu’s picture

Yes, the issue is at Google's side. All Google searches have this problem in my experience. You don't tend to notice it most of the time with a regular google.com search, because the result counts tend to be so massive!

The "page" GET parameter counting from zero is the standard behaviour for the Drupal pager. You'll see this on any set of paged results on a Drupal site. So while it could be modified for this module with some extra effort, it would be an inconsistent change.

Sree’s picture

Any further updates on this fix/workaround ?

jweowu’s picture

I'm afraid I have no interest in implementing this workaround myself, and nor did meba. Sorry. Maybe if someone else is keen, they'll provide a patch. It's a fair amount of work for a somewhat dubious benefit, though (and implies a near doubling of requests made to the search server), so I wouldn't count on it happening.