There are a few places in the D7 port of Project* where it seems like the most simple and sane solution involving "off-the-shelf" parts would be to add a dependency on field_collection:

#1609382: Port release files with extra metadata to D7
#1612050: Figure out how to represent branches for D7 project_release
...

I really like the idea of just using a generic module that lots of other sites are using, instead of trying to roll our own custom entity types or otherwise implement these ourselves in Project*.

However, I don't want to get into a situation N weeks from now when I'm ready to deploy and the rest of the Infra Team freaks out about this new dependency. ;) So, I wanted to open this issue now to get "pre-approval" for deploying this.

Other than it being a useful/generic solution to some problems we're facing, field_collection has the following things going for it:

  • timplunkett is an extremely responsive and helpful co-maintainer. He's already offered to help in any ways he can as needed for d.o.
  • both myself and merlinofchaos have worked with and contributed to field_collection in the past
  • even though it's still only on beta4, it's already got 14078 sites using it
  • timplunkett has said that an official 7.x-1.0 release is entirely feasible by the time we'd actually be deploying

Not sure what else to say. ;) killes, nnewton, drumm -- any objections?

Thanks!
-Derek

Next Steps

This issue is no longer postponed now that #1609382: Port release files with extra metadata to D7 is fixed.

Comments

gerhard killesreiter’s picture

can we see the kind of queries that it produces?

dww’s picture

Uhhh. I don't know what you mean. It's D7 field API hell. It's a module for basically "fieldable fields". You make 1 field, on the parent entity, which is essentially an entity reference to a sub-entity. Then you can add whatever fields you want to the sub entity. So, for example, on a release node entity, we might have a field collection called "release files", which is multi-valued. This just points to a new entity that field_collection creates for us called "field_release_files". And then we hang whatever fields off of that that we need for each file associated with a release (a file field, download count, md5hash, etc).

Fields might not even live in a DB, so "the kind of queries that it produces" is sort of a meaningless question. Everything gets reassembled during entity_load(). That's The D7 Way(tm). You missed your opportunity to object to that about 2 years ago. ;)

Cheers,
-Derek

gerhard killesreiter’s picture

Well, whatever the API, it might somehow impact the database. And since that's the part of the infra that is hardest to scale, the question makes totally sense. If the answer is "it doesn't impact the DB queries", then this can be marked fixed, of course.

dww’s picture

I think you need to spend more time with the internals of D7 to know what I'm talking about. Field API has pluggable storage. By default, every field lives in a separate DB table. However, you can swap out the storage such that all the fields for a given bundle (think "node type") live in a single DB table. Or, you could store your fields in MongoDB. Or any other crazy storage mechanism you want. The code isn't supposed to know how you're storing your fields.

Nothing about field_collection changes any of these fundamental aspects of how D7 fields work. All field_collection is doing is making it easier for me to assemble the fields and entities I care about to represent the stuff I need to represent. And it makes it possible for sites to extend the kinds of things Project* is storing just via the UI instead of necessarily having to write code for it.

But ultimately, the (non)performance of all the fields and entities is D7 core's problem. That's really a separate question. I agree that we should be afraid of how d.o is going to scale with D7 core fields. We're probably going to have to jump through a lot of hoops to make that work. But that's totally orthogonal to the question of if those fields are coming from the core field UI, if they're created programatically in Project*, if they're clicked together via field_collection, etc.

gerhard killesreiter’s picture

I still want to know the database impact. :p

If you say, that the impact will be the same regardless whether we use field_collection or not, that's fine too.

senpai’s picture

I think what we should do is benchmark the loading of two entities. One with a hundred fields in it, and one with 10 fields that also includes a field collection containing 90 fields. Let's see if there's any real world difference between the two in a direct comparison test. If it's acceptable, we use Field Collection. If it's not, we don't.

Come to think of it, the creators of Field Collection have to have done something like this before in order to see if their idea was valid. Let's ask them for some metrics, shall we?

dww’s picture

Assigned: dww » Unassigned
Status: Needs review » Postponed

Just had an IRC chat with nnewton, killes, and timplunket. Summary:

  • No one actually knows enough yet to be able to make an informed decision on if this is going to be a problem or not.
  • The possibility that D7 field API is a performance nightmare for d.o is real, but more or less unrelated to whether we use field_collection or not.
  • The performance hit depends a lot on the usage. Specific node pages (e.g. project/foo or release node/N etc) might be fine, but if we're trying to sort/filter based on things inside a field_collection, it could be worse. It's not just a question of how we store the data, we have to see how it's being used.
  • We all agreed that starting with a relatively simple field_collection use case, building it, and then trying to assess how it actually behaves is prudent. So, we can't completely commit to field_collection yet, but we can at least move forward and see what happens. I think #1609382: Port release files with extra metadata to D7 is the best candidate for that.

So, that's the plan. I'll move forward with field_collection at #1609382 and then we can circle back and try to assess again once we have something real to test. For now, this issue is postponed...

damien tournoud’s picture

I would recommend not using Field Collection in places where it would make more sense to have a custom entity type. It's the case where there is a strong business logic in play. Not sure it applies here or not.

The impact of field collection itself in terms of performance should be the same as creating a custom entity type. Nothing we cannot manage by denormalizing some queries.

j0rd’s picture

I'm curious if the performance implications of Core Fields vs. Custom Entity vs. Field Collection ever came to light. This would be useful information for people like me, who find this thread via Google.

By guessing I would assume Custom Entities would be less of a burden on the database due to having multiple field values in a single table, when using the standard MySQL database backend....but I'm curious what you guys have discovered.

senpai’s picture

Status: Postponed » Active

The field_collection module is now a dependency of D7's project_release, and thus it needs to go live on drupal.org.

senpai’s picture

Issue summary: View changes

Adding a Postponed section

senpai’s picture

Status: Active » Fixed

So that's that then. Since it's been five months with no benchmarks and no real decision-making discussions, the module is already in the the staging server's codebase and is also a dependency of this new D7 site, we're going with field_collection on drupal.org.

If @nnewton's performance testing finds this to be a problem, we can revisit it then. Until that point, carry on my friends.

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

This issue is no longer postponed now that #1609382: Port release files with extra metadata to D7 is fixed.

Project: Drupal.org infrastructure » Drupal.org customizations
Component: Drupal.org module » Miscellaneous