Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I'm trying to use EntityFieldQuery::fieldOrderBy to sort results, but it filters out results where the field is empty. I think this is not the proper behaviour (sort criteria shouldn't modify the results set).
Comment | File | Size | Author |
---|---|---|---|
#9 | drupal-1611438-9.patch | 1.15 KB | tim.plunkett |
#5 | without-fieldOrderBy.png | 54.95 KB | bxtaylor |
#5 | with-fieldOrderBy.png | 51.83 KB | bxtaylor |
Comments
Comment #1
tim.plunkettI don't think this is major, but leaving for now.
This may not even be a bug, per se. What is the behavior you'd expect?
Comment #2
jordisan CreditAttribution: jordisan commentedI would expect empty values to be at the top or at the bottom (that even could be a parameter). But I don't think that sorting results by any criteria should filter out any result.
Do you know any possible patch or workaround? I'm trying to use Drupal APIs and avoid making my own SQL sentences...
Comment #3
pivanov CreditAttribution: pivanov commentedAgreed.
With the current behavior, the method name is misleading as well.
A default value argument would be nice to have in the future.
Comment #4
jordisan CreditAttribution: jordisan commentedSo... what about a solution in D7?
Comment #5
bxtaylor CreditAttribution: bxtaylor commentedHere are the steps I used to reproduce this bug:
I agree with the OP that the returned node should not be filtered out just because the field is not populated.
I think the breakdown is actually in the execute() method when this line is called:
The function being called here is hook_field_storage_query, which if I understand correctly, goes through the field table (in this case field_data_field_field_test) and returns results based on what it finds there - which would explain why using the fieldOrderBy method only gets you two results (in my example code) and not three.
Module code
(with fieldOrderBy method commented out)
Comment #6
bxtaylor CreditAttribution: bxtaylor commentedChanging issue component is appropriate, I think.
Comment #7
bxtaylor CreditAttribution: bxtaylor commentedRead through the entity related issues and these seem related:
#1226622: EntityFieldQuery doesn't support query for entities without a value for a specific field
#1240566: In field_sql_storage_field_storage_query, any condition on a field's column excludes entities with a NULL value for this column
Essentially, those issues state that because the
field_sql_storage_field_storage_query()
hook uses an inner join to return results, any NULL values won't be returned.Potential workaround here: http://drupal.org/node/1157006#comment-5465034
I'll see if that works at all and post any results here.
Comment #8
tim.plunkettComment #9
tim.plunkettHere's a patch proving that this is broken.
Comment #11
BerdirThe test shows why it can't work. Looking at the SQL queries that it generates:
There is no JOIN involved. Instead, it directly queries the field data table, and if there is no value in there, it obviously does not return anything.
Even when adding a property condition to force an involvement of the entity base table means that we start on the field table and then join the entity table.
So, the only way to fix this would be to always start with the entity base table and then join the field table with a left join which would have quite some performance regressions.
Comment #12
tim.plunkettI discussed this with chx in IRC:
There is a chance that fixing the bug is so horrible for performance that we just leave it, but we won't know until we try.
Comment #13
chx CreditAttribution: chx commentedTrying does not involve coding it: just time some plausible left join queries vs inner join ones. Check what efq generates (berdir already posted samples) and change inner to left. And check the number of results too.
Comment #14
BerdirOk, made a few tests on a real, large database. 540k nodes, tested two fields, one with 40k rows and another with 500
First the current query, directly on the field table, no joins involved.
(Hint: We have a custom index on the value column of field_B_value which gets the second query down to 0.00)
Now, a query that starts on the node table and uses a left join
(Hint: the mentioned index does not improve this query, although you might be able to create one that does)
Slow enough?
You're of course invited to do your own tests and do not need to believe me :)
Comment #15
BerdirGiven the above performance regressions that this change would imply to document this as by design and keep the issue open as a non-major feature request in case someone comes up with an idea to solve this in a nice way.
Comment #16
tim.plunkettAgreed.
Comment #17
bxtaylor CreditAttribution: bxtaylor commentedThanks for those tests @Berdir!
While the
fieldOrderby()
method does not have the behavior I'd expect, I agree that this is a "works as designed" situation.I think, however, it'd be better to close this as "works as designed" and open a new feature request issue that references this issue.
That way, anybody that lands on this issue can see that we put it to the test and closed it appropriately.
The behavior does need to be documented as @tim.plunkett suggests, but that should be a separate issue, too.
Here are the follow up issues:
#1662942: Make EntityFieldQuery::fieldOrderBy include NULL field values
#1662950: Document that EntityFieldQuery::fieldOrderBy will exclude Entities with empty field values.
Comment #18
matt2000 CreditAttribution: matt2000 commentedI couldn't reproduce anything like Berdir's results on a database of 134k nodes. I found LEFT JOINs to be no worse than a couple hundredths of a second slower than INNER JOINs, and even slightly faster on one table.
I'm currently generating another 400k nodes for further testing, but I'd like to raise the question of whether performance consideration for large sites should justify disallowing developers of small sites from even the option to use LEFT JOIN. (I am not suggesting it should be the default. My proposed solution proof-of-concept is http://drupal.org/node/1057110#comment-6179538 . And yes, I know about hook_query_tag_alter, and I feel that it's terrible DX for this requirement.)
Comment #19
BerdirThe real performance difference isn't between LEFT or INNER JOIN.
It's between a query directly on the field table (which is what EFQ does by default) and one that starts of on the node table and joins the field table. And as you can see in my results, I had equal performance on a set of 40k nodes between left and inner join, it only really made a difference with a much larger result set, possibly because it had to use a temporary table on the disk or something.
bxtaylor created a feature for allowing to do a left join if you really want to, what we agreed on here is that it's not a *bug* (which would have to be fixed by default). Because that would mean that you'd always need to start of the base table and then left join the field table, no matter what condition/orderBy's you have. Not about LEFT vs. INNER JOIN, that doesn't affect this.
Comment #20
tim.plunkettLeave this as is, continue on in #1662942: Make EntityFieldQuery::fieldOrderBy include NULL field values
Comment #21
forssto CreditAttribution: forssto commentedI just have to say I'm incredibly bummed out by this issue and the way people seem to regard it.
To say that a query interface that may change the contents of the result set depending on its ordering is working as designed is pretty much reckless.
Yes, there may be performance benefits in taking the broken path, but what use are the performance benefits if they result in functionality that's counterintuitive.
Anyways, I just patched this in our class that wraps EFQ. The solution can be summarized this way:
Apologies if there's any bugs of missing functions. I tore this off a bit more comprehensive suite without any in-depth testing or sanity checks. I'm sure the code could be cleaner, more concise or more performant, but at least it works... Which is more than I can say about EFQ's sorting interface. Send me a message at tommi@kollabora.com if you want to discuss further.
Comment #22
Rob230 CreditAttribution: Rob230 commentedI agree it is a shame. Regardless of performance implications, I fail to see how it can be considered correct that ordering a query changes the result set. Ordering should change the order; it should not remove rows.
I found this bug when using tableSort(). If you click on some of the column headings then the results get ordered, but if you click on one of the headings, you end up with half of the rows removed. It doesn't make any sense to the user that trying to order the results removes some of them. You would expect that blank entries appear first or last.
I understand that it is difficult because of performance reasons, but it's a shame that it was just given up on when (in my opinion) it is not correct behaviour.
Comment #23
tim.plunkettThis issue is closed, please follow-up on #1662942: Make EntityFieldQuery::fieldOrderBy include NULL field values