When there is a multiple value field for which each value must be displayed (thus creating partial "duplicate" records), the pager may miss records, particularly when there are other fields displayed as well.
The reason for this appears to be the fact that the count_query does not have the same logical structure as the data query, and thus it brings back a count of less records than the data query does.
How to replicate the problem:
- Create a content type with at least multiple-value CCK text field (I created a content type called "directory_entry" with a field named "last_name")
- Add another field to the content type (I added one called "phone_number")
- Create a bunch of nodes
- Create a view on this content type
- Page view
- Use pager (choose your page size)
- Select fields (last_name, node title, phone_number)
- Make sure it's sortable on last_name
- Filter on content type, last_name is not empty
...I had a brief look into the code. I'm not acquainted with the views code, but I noticed in include/view.inc that line 673 and 674 look like this:
$query = db_rewrite_sql($this->build_info['query'], $this->base_table, $this->base_field, array('view' => &$this));
$count_query = db_rewrite_sql($this->build_info['count_query'], $this->base_table, $this->base_field, array('view' => &$this));
...when I printed these queries off lower down in the code, just before they get executed, I got the following:
query:
SELECT DISTINCT(node.nid) AS nid, node_data_field_lastnames.field_lastnames_value AS node_data_field_lastnames_field_lastnames_value, node_data_field_lastnames.nid AS node_data_field_lastnames_nid, node_data_field_lastnames.delta AS node_data_field_lastnames_delta, node.type AS node_type, node.title AS node_title
FROM node node
LEFT JOIN content_field_lastnames node_data_field_lastnames
ON node.vid = node_data_field_lastnames.vid
WHERE (node.type in ('directory_entry')) AND (node_data_field_lastnames.field_lastnames_value IS NOT NULL)
ORDER BY node_data_field_lastnames_field_lastnames_value ASC ;
and count_query:
SELECT COUNT(*) FROM (
SELECT DISTINCT(node.nid) AS nid
FROM node node
LEFT JOIN content_field_lastnames node_data_field_lastnames
ON node.vid = node_data_field_lastnames.vid
WHERE (node.type in ('directory_entry')) AND (node_data_field_lastnames.field_lastnames_value IS NOT NULL) ) count_alias;
...so now the problem is obvious...the fact that the count_query is not including the other fields means that the DISTINCT will chop the result to less records. Because I'm not acquainted with the views code, I simply replaced line 674 with this:
$count_query = $query
...because it seemed intuitively odd to me that the count query would be different at all. I understand not using the record count function as it's often implemented in a very inefficient way (sometimes causing a complete traversal of the result in order to count the records) but I'm not sure I understand the reasoning behind generating the query separately. There's too much chance that something like this will happen, where the count query has a different logical structure, and therefore returns spurious results.
However, as I said...I don't know the views code, so maybe there's a reason I'm not seeing. Regardless, I hope this explanation helps fix the bug. For now, my "fix" works in my case, so I guess I'm not exercising anything that would be indicative of the reason for the differing count/data queries.
Comments
Comment #1
nokes CreditAttribution: nokes commentedOh, I should note as well that I had been having the same issue at least with 6.x-2.2 as well. I just noticed the problem, then updated to 2.3. When I saw that the bug was still there, I started looking into it and posted this issue.
Comment #2
David_Rothstein CreditAttribution: David_Rothstein commentedI'm relatively new to Views also, but from looking at the code in includes/query.inc -- in particular, the query() function -- it seems like Views already does what you suggest. That is, if the query has a DISTINCT in it, Views will not try to optimize the count query but rather return the original query with all fields included... at least, that appears to be the intention.
However, I also noticed that a recent bugfix went into this part of the code in #371923: Arguments always using distinct, whether set to or not... which seems like it might be related to the problem you're experiencing. Have you tried this with the latest -dev version of the Views module, and if so, can you reproduce it there or does the problem go away?
Comment #3
merlinofchaos CreditAttribution: merlinofchaos commentedIndeed you shouldn't be getting a different query if DISTINCT is set. Hm.
Comment #4
nokes CreditAttribution: nokes commentedSorry for the delay in getting back...been busy on other projects...I'll test with the dev version and see what happens.
Comment #5
nokes CreditAttribution: nokes commentedJust tested this with the most recent dev version (2009-Feb-25) and I've got the same count_query (and the same result). To clarify, I made the change at line 705 of include/view.inc, so that it looked like this:
...and got this result:
In case it's helpful, here's the export of the view:
Comment #6
nokes CreditAttribution: nokes commentedMore information:
Following up on David_Rothstein's comments, I checked out include/query.inc
The intereresting part of what I found may be that the views_query object has no value in it's distinct data member:
In include/query.inc, in function query, just prior to the foreach, I inserted this:
...which gave me this, clearly showing the distinct data member without a value:
...still digging...
Comment #7
nokes CreditAttribution: nokes commented..the DISTINCT is there all the way from the db_rewrite_sql commands (include/view.inc line 673 and 674), so obviously one of the modules is determining that DISTINCT should be set.
from modules/node/node.module:
...and since our primary table is "node" and our primary field is "nid", node.module is determining that the DISTINCT should be set. As you can see from above, though, the view doesn't seem to know that this is the case.
Comment #8
gpk CreditAttribution: gpk commentedPossible duplicates:
#488712: views pager doesn't work with repeating date (in the Date issue queue)
#521770: Displaying a Multi Value Field with a pager when using arguments
#552804: Records are missing when using pager and items per page
#553766: Pager doesn't work with multi-field CCK items?
Comment #9
khan2ims CreditAttribution: khan2ims commentedI have posted this at other forums too.
For me, the pager is not showing up if I set the Row Style to "Node". If I keep it fields and add in fields, then the pager shows up. And it doesn't matter if I have multiple views on that page with pager enabled.
F1 F1 F1 !!
Comment #10
merlinofchaos CreditAttribution: merlinofchaos commentedJuste to note, #9 is completely irrelevant to this issue.
Comment #11
attheshow CreditAttribution: attheshow commentedsubscribing
Comment #12
davidredshaw CreditAttribution: davidredshaw commentedDo we know the reason for having separate results and counting queries? Setting $count_query = $query fixed this for me but I'm sure there was a reason for having the two separately.
Comment #13
dawehnerThe count_query counts how many items has the full result, so based on this a pager can be built. tting $count_query = $query fixed this for me but I'm sure there was a reason for having the two separately.
Comment #14
esmerel CreditAttribution: esmerel commentedNo activity on this for more than 6 months
Comment #15
gpk CreditAttribution: gpk commentedActually I suspect this is a dupe of #552804: Records are missing when using pager and items per page (amongst others), which has been won't fix-ed. Roll on D7 (which fixes this via the new database layer).