There is no meaning to loading database fields and calling them entity properties. You have no idea what the entity controller class is and what it might do during loading, equally you do not know what load hooks might do.

For example, good luck integrating entity cache into this.

The only meaningful thing you can do with an entity is to entity_load it and that's it.

It's frightening that this module exist because it shows people are STILL coding around the APIs instead of using them. I can't possibly imagine a use case for this -- if you have really large entities and you want to process a number of them use drush or batch API.

Comments

Nick_vh’s picture

We had this discussion on IRC. The module serves its purpose and helps solving a use-case I had and seems to help some other folks.
I'm ok with adding a warning sign that this does NOT take anything out of cache and does not always get what you expect but it does make EFQ more usable.
The idea is that this is like a db_query but then less complex and you work with semantic language to form your query. Not more, not less.

I'd be happy to talk a little more on how we can solve this in a higher level so that EFQ becomes more useful and not just a class to search entities with.

Some replies on twitter back this up:
https://gist.github.com/attiks/a555ff84e16100d62532

Again, I'm not saying this should be part of core, nor was it my intention to avoid using APIs. This is a way to solve what I needed.

attiks’s picture

Moving my use case to this issue:

Use case:

Drupal 7
User has 30+ fields
500.000+ users

System is used to send out personilized mass mailings, the sending part is done by an external system that queries the Drupal site to retrieve the next batch of users (500 each time), and for each user we only need some of the fields (uid, name, mail, status, language, first name, last name), all other fields are not relevant for the sending.

We now use a db_query to select the fields we need, using http://drupal.org/project/efq_extra_field is a better way. I know for D8 we can still construct our own query, but something like efq_extra_field would be nice to have.

The reason we want to avoid entity_load is performance, we only need to query a certain user once a week, storing all 500.000 users into a cache is over kill.

I agree this is not for the faint of heart, but it has it's place in the drupal eco system.

attiks’s picture

Caching all user objects to only query them once a week or so is IMHO over kill, during that week some fields on the user entity can be changed so most of the cache will be stale anyways. Whenever a user logs in, we update some fields from an external ERP system, because they have to be correct. The only reason users login is the change some of their account details.

Keep in mind this is an edge case, I'm only supplying it as an example why this module might be usefull

DamienMcKenna’s picture

@attiks: The extra detail you provide for your edge case, specifically

we update some fields from an external ERP system

suggests that maybe the system should just talk directly to the ERP system rather than load data from Drupal at all?

attiks’s picture

#4 in an ideal world you would be right, but we don't have control over that part, it's an old, slow and unstable system so we can not rely on it and whenever the backup is running their API end points aren't even accessible.

DamienMcKenna’s picture

@attiks: So again it seems like you'd be better off caching everything and then only dealing with data updates as needed, IMHO that would also simplify code maintenance.

Nick_vh’s picture

Title: The whole module is broken and invalid » Caution: This module comes with warnings

renaming the title to be a little less dramatic and I don't see how we can fix anything here.

Can we work towards a textual explanation warning users before they use this module? That might be a good way to scare starters but still leave this module for those that really want to use it.

chx’s picture

Title: Caution: This module comes with warnings » The whole module is broken and invalid

The question here is, if you want to deal with SQL data why are you calling it an entity and abusing EFQ over it? That's the problem. You do not have entities if you talk to SQL directly.

Edit: sorry: x-post.

Nick_vh’s picture

The returned object is by no means an entity. It is basically a search tool that allows you to also give you some of the metadata that you otherwise would have available if EFQ would not remove the data at the first place.

Perhaps the question is, why do we cut off that information if it is available in the database layer? If we, as developers, know that the info that is availabel in the db layer is the correct one (say 99% of the usecases) then I don't see why we could not use efq_extra_field as a handy tool to do just that.

Anything else is probably not in reach of this module and I am not planning to extend it to support more. I am not aware how mongodb works in Drupal 8 nor in Drupal 7 but I use efq and extend it & creating extra helper functions to get rid of the finishQuery limitations in EFQ in the first place.

I'd agree that a simple db_query would solve the same issue, but that requires loads more knowledge on where your stuff is stored.

Nick_vh’s picture

Title: The whole module is broken and invalid » Caution: This module comes with warnings

renaming again, no problem for the X-post.

attiks’s picture

#6 but caching would still mean we need multiple queries and we load more data then we need. The current implementation is only executing one query for each batch. Keep in mind that speed is pretty important, the idea is to send out all emails as fast as possible so each query we can avoid is a win.

We have some caching in place, for the cases the ERP system isn't accessible, but that's custom. I'll have a look to see if entity caching can improve some parts.

DamienMcKenna’s picture

@attiks: I guess I don't understand what query is being done that doesn't involve data being stored in the $user object. A query to load a cached object from the cache_entitycache_user table should be pretty damned quick, even considering 500 such queries. Can the other per-user data that's needed be restructured to be part of the $user object via hook_entity_load()?

chx’s picture

> If we, as developers, know that the info that is available in the db layer is the correct one

You can't know that as a contrib module.

> then I don't see why we could not use efq_extra_field as a handy tool to do just that.

And that's the fundamental problem, that you don't see. That's why I am so bothered. If you, a Drupal engineer at Acquia don't see then who else doesn't see it and who does?

> I'd agree that a simple db_query would solve the same issue, but that requires loads more knowledge on where your stuff is stored.

You already need to know that otherwise how did you plan verifying that the garbage you load from the database happens to contain the same data as your entities?

cpliakas’s picture

Title: Caution: This module comes with warnings » Vet the use cases of this module and discuss alternative solutions where appropriate
Category: bug » task
Priority: Critical » Normal

This is not a valid bug report as there are no details other than a rant in the OP. Let's try to be productive here. These kind of posts make me really angry at the Drupal community as they stifle innovation, establish a "Good ol' boy" culture, and are poisonous the to project as a whole. Let's bring this up to a higher level and vet the use cases and discuss alternative approaches where appropriate.

When someone writes code to solve a use case and makes the decision to contribute it back in hopes it will help someone else, they should be commended and collaborated with, not condemned. There are no Gods in open source, only people who invest their time and energy to solve problems and choose to share their solutions with others. Crap like the original title and post outweigh any good that comes from code contribution, so I hope the passion can be redirected appropriately for the betterment of Drupal.

Chris

chx’s picture

I think my history shows I appreciate contribution to Drupal quite a bit but this module fosters a mindset ("screw the APIs, hack something that works") which is patently harmful to Drupal. Like, people want to get backwards compatibility, they wanted it forever, that will never happen if people don't get past this mindset. I do not know how does that establish a "Good ol' boy" culture?

chx’s picture

Also, the absurdity of calling me out for stifling the innovation, can't you see that this mindset, this module is the one that is stifling innovation? How can we add a new feature to D7 (which now we do from time to time) if it is not enough to verify that the public API surface doesn't break but also that modules like this do not?

cpliakas’s picture

chx,

I don't know you, but I greatly respect your contributions with as much sincerity as I can express through typing. But I kindly ask that you at the very least try to respect the contributions of others as well. Respect doesn't mean that you have to agree or even like the approach, but instead of going after a developer, try to understand their thought process and why they came to the solution they did. Maybe the APIs you referred to didn't work for one reason or another. Maybe they aren't clear despite the best efforts to make them transparent and documented. It is possible that you could learn something useful here whether it is related to code or documentation. And maybe not. My point is there is a better way to approach this.

Even if the code and techniques are as terrible as you claim, I will fully admit that I too have contributed my fair share of crappy code. Everyone has, and if they say they haven't they are either lying or delusional. Some of my ideas were horrible and the project died. Others were useful, and through the power of open source there were personable and talented people who helped me learn different techniques and turned the idea into something better. Either way, what we can at least try to learn why these ideas are helpful to people. If the ideas aren't valid, they will go the way of the Dodo.

The attack approach stifles innovation for two reasons. The first, as mentioned above, is that there is no attempt to understand why this code exists. Therefore you are discarding any information that could be used to improve Drupal's APIs or documentation. Second, if I were Nick_vh I would say screw it and stop contributing. Why would I spend my time contributing to a hostile community? Nick_vh is probably a stronger person and better contributor than I am and can handle it, but not everyone is like Nick_vh.

I hope you can at least try to see this view point. If not, I'm sorry to derail this thread and waste our collective time.
Chris

attiks’s picture

#12 we now use 1 query to retrieve a batch of 500 users, query is custom build using the fields info to get the right tables, in short it's something like this

select u.uid, u.mail, f.field_firstname_value
from newsletter_mail n 
  inner join users u on n.uid = u.uid 
  inner join field_data_field_firstname f on u.uid = f.entity_id
  where n.status = 0
limit 500

If we have to switch this to entity_load, we'll need a query to get 500 uid and an entity_load_multiple with those 500 uid, this will be slower.

Using hook_entity_load to load the extra data from the ERP system, will basically kill the ERP system, if we fire a couple of queries per second to the system it starts flipping ;-)
We opted for fields on user because the client is using views (and views_saved_searches) to make selection of users, so we needed an 'easy' interface for them.

netsensei’s picture

I'm chiming in with a probable use case: I'm maintainer of http://drupal.org/project/commerce_product_display_manager. Presently, I'm cleaning out a few bugs to get a maintenance release out.

At it's heart, the module tries to generate a list of node entities and related products per node through product reference fields. It uses an elaborate db_query to get a result set in an efficiënt way. The original maintainer chose db_query and some elaborate contrib code to touch the DB directly. I even tried to reduce the complexity of the result set by optimizing the query in my patch in #1323376-7: No products and displays found when having multiple fields.

Why db_query and not EFQ?

This module does not return a flat list of data, it lists all defined relations between products/product nodes. Give the number of probable permutations, the original maintainer didn't want to load entire entities:

1. He didn't want to fetch entire entities in themselves since he's only am interested in an id, title and bundle. Retrieving anything else only presents overhead:
2. He still had to fetch products referenced to product display nodes. Given 1, it's overhead to load an entire entity just so he could fetch related entities through feild_get_items(), Entity API,... That's what the elaborate logic tries to accomplish.

I did contemplate changing the entire thing to EntityFieldQuery at one point, but I refrained from doing so out of performance considerations.

To me, the discussion boils down to this:

1. Developers still face use cases where they want granular control over what they fetch because of "perceived" performance considerations.
2. A module like EFQEF tries to pave this cowpath by eliminating the complexity of db_query()
3. The downside of such approach is that it such approach "circumvents" interlocked API mechanisms which are probably very necessary during data retrieval and guarantee data integrity.

Sounds like a catch 22 to me. Unless proven that EFQ/entity_load does not incur a significant performance hit over db_query solutions.

PS1: Commerce commonly models a relationship between product and node entities through a product reference field and PDM is hard wired to query the node table. But, theoretically, any entity bundle (user, term,...) could become a product display as soon as a product reference field instance is created...

PS2: Yes, I'm aware of http://drupal.org/project/inline_entity_form which takes a different approach to getting a better handle on managing large numbers of product/product display relationships.

My .02 cents.

netsensei’s picture

Ah yes, and http://drupal.org/project/translation_overview does something similar with a SelectQuery object, touching the node table semi directly.

chx’s picture

To the latest three or so comments: see #8. Going bare metal SQL with DBNTG for one-off edge cases is a problem but a known quantity -- you do not expect it to work generally. You know what you get is data from the database. Depending on your codebase it might or might not be the same as entities. You are aware you clearly bypassed Drupal.

An entity is a result of an entity_load.

There is no middle ground between these two. It doesn't exist. That's where this module tries to be.

jcisio’s picture

Status: Active » Closed (works as designed)

Ok, so we do have use cases. We use EFQ to leverage table joins, and this module extends EFQ, for sure. I don't think we need to add more use cases: all related to complexity/performance. I don't need to write a 20 line query, or use EFQ with caching for 500.000 entities for a function that is used one every a few weeks.

That said #8 is correct. EFQEF can't, immediately, fetch arbitrary entity data. It will not, in a foreseeable future, fetch arbitrary entity data mixed in SQL/NoSQL. But I don't think it is a big problem that the module names itself after EFQ. We have a lot image/imagecache related modules that only work with Imagick or GD. We have a lot WYSIWYG integration modules that only work with TinyMCE/CKEditor/etc. Nobody shown intention to rename the module, so I'm closing this issue.

Nick_vh’s picture

This module is as-is and comes with no warrantee (hence contrib).
This module allows you to return data from rows in your database using Drupal API function.
There might be some assumptions that require the use of MySQL and I do not guarantee that MongoDB works. If someone wants to fix that, feel free.
This module takes the original EFQ query and extends it so you can return those fields that would otherwise be available as if you would write a direct query
This module tries to be an enabler for developers that want to use EFQ because the syntax is quite easy but they don't want to do an entity load as they *know* that the data is only available in the database. This is on their own risk.
This module does not pretend to return entities, nor does EFQ

And it seems that there are enough use cases around to justify the module, knowing these constraints I feel ok in having this module out in the open.

chx’s picture

Title: Vet the use cases of this module and discuss alternative solutions where appropriate » The whole module is broken and invalid
Category: task » bug
Priority: Normal » Critical
Status: Closed (works as designed) » Active

Enough of this! People retitle and close one of the most important issues in years. It's important because as #23 shows, even Acquia engineers have no effin clue about what entities are, how they work, how they should work. And so who does?

> There might be some assumptions that require the use of MySQL

Like you are calling the methods of a DBTNG Select object? Yes it is my fault that some of the SQL code got mixed into the EntityFieldQuery class itself but that doesn't mean this "some assumptions" when you mean "this module is messing with SQL directly".

> If someone wants to fix that, feel free.

I already won't fixed the mongodb issue. And it's a can't fix really.

> This module tries to be an enabler for developers that want to use EFQ because the syntax is quite easy but they don't want to do an entity load as they *know* that the data is only available in the database. This is on their own risk.

This is (now) explained, barely on the project page. Much stronger wording is needed, I will get back to this.

> This module does not pretend to return entities, nor does EFQ

o_O What? EntityFieldQuery returns, as much as it was possible in Drupal 7, the identifiers of the entities. You can hardly do anything with that data but to entity_load it. (Yes, you can use the keys directly to store or retrieve some reference to the entities too, but that's beyond the point) So you took a class that worked with entities, abused the query builder out of it and that's it.

What about adding this to the top of the project page:

This module abuses EntityFieldQuery to return data from your database. It does not work with entities or any modules that build on them.

jcisio’s picture

#24 When Drupal added the Fast 404 pages features, did it add a warning like: "All modules that use hook_menu are broken and invalid. They could not declare paths ending with .css or .js etc.". Even it is not enabled by default, it a "broken feature" in the same manner as EFQEF with your saying.

All optimizations are done with a trade-off. This module is not an exception.

Nick_vh’s picture

Title: The whole module is broken and invalid » Vet the use cases of this module and discuss alternative solutions where appropriate
Nick_vh’s picture

@chx, Added that line to the project page.

chx’s picture

Status: Active » Fixed

Then let's call this fixed although the (repeated) insistence of "There might be some assumptions that require the use of MySQL " still worry me quite a lot.

Nick_vh’s picture

Maybe to clear that out, there is no such thing in the codebase of this module that I know of or am aware of that would bind this to mysql only. Please point us to where this would be in the code. I'm happy to add some code comments that there is a mysql-only hack or so. I did this with the best intentions and to be as non-hacky as possible to reach my goals.

That said, I'm happy to learn every day and understand the limitations when it comes to other db systems. However, it needs to be constructive and I still don't like how this was brought up. Let's move to the next page because such discussions don't serve any purpose here.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

chx’s picture

Ah, I didnt see #29, so it's specific to MySQL, sorry, it's specific to SQL which is bad enough.