Problem/Motivation

_field_info_collate_fields() is the central point of access for field and instance definitions, used by the various field_info_*() API funcs :
field_info_fields(), field_info_field_by_ids(), field_info_field(), field_info_field_by_id(), field_info_instances(), field_info_instance().

The function holds an array of all fields and instances across all entity types and bundles, as a monolithic static cache + persistent cache entry. The list is rebuilt or fetched from persistent cache on any request involving a fieldable entity (that is every request, since users are fieldable).

With free-for-all fieldable entity types, that array can rapidly grow out of hand (measurements are from @catch in the OP):

  • 140k on a fresh install
  • 1M on installing media + media_gallery contrib modules
  • 3M on non unrealistic sites
  • 10M+ on some commerce sites and other sites with many bundles/instances

Proposed resolution

Split the static and persistent caches per [entity type, bundle]. On each request, only the data about the fields and instances present in the bundles involved in the request are loaded from cache and kept in memory.

That approach stemmed from @catch's work on DrupalCacheArray, which aimed at solving similar issues with the cached theme and db schema registries. However, the special needs here (notably the correlation between fields and instances) make the DrupalCacheArray class itself unapplicable, and the code has evolved into its own thing for the needs at hand.

The FieldInfo class is the central provider of run time data about fields and instances. It holds the static and peristent caching logic.
As discussed in #16/#18 below, caching strategy is optimized for the following use case (most frequent across core and a couple 'big' contrib suspects - Views, Ctools, OG, Field permissions, Field collection, OG, Date, References, Relation) :
foreach ($instances in a given bundle) {get the $field; do something with $field and $instance;}.

"Give me all fields/instances" functions (field_info_fields(), field_info_instances()...) are as costly as before, and should be avoided when possible. The patch deals with the existing cases in core.

A spreadsheet is being compiled with a list of Contrib modules which use these costly functions, and which will need to be updated to take advantage of this patch: https://docs.google.com/spreadsheet/ccc?key=0AjNi8SoLpdaQdEFjaV9jY3ZwYkl...

Remaining tasks

Some things to consider in follow-ups:

User interface changes

None.

API changes

No API change strictly speaking.

  • The existing field_info_*() functions still take the same arguments and return the same values.
    Their relative cost, however, has changed: they all used to load all fields / instances in memory; some functions are now less costly than they were, which changes the recommended use patterns.

    - field_info_fields(), field_info_field_by_ids(), field_info_instances() are still persistently and statically cached. Their use is discouraged, since they have a important memory & CPU cost.
    - API ADDITION: field_info_field_map() returns a lightweight map of the use of fields across entity types and bundles. It should be able to replace most use cases for field_info_fields() / field_info_instances().
    - field_info_instances($entity_type, $bundle) reads the field+instance data from the persistent cache entry for the bundle, and is statically cached.
    - field_info_field($field_name), field_info_field_by_id($field_id), field_info_instance($entity_type, $field_name, $bundle) are static cache hits (no CPU/memory cost) if done on a field/instance previously loaded in cache by field_info_instances($entity_type, $bundle).

    The following code was the usual way so far, and is now not recommended (although it will not perform worse after the patch) :

    // Collect fields in advance to save function calls to field_info_field().
    $fields = field_info_fields();
    foreach (field_info_instances($entity_type, $bundle) as $field_name => $instance) {
      $field = $fields[$field_name];
      // Do something with the $field and $instance
    }
    

    Recommended way after the patch :

    // Let field_info_instances($entity_type, $bundle) 
    foreach (field_info_instances($entity_type, $bundle) as $field_name => $instance) {
      $field = field_info_field($field_name);
      // Do something with the $field and $instance
    }
    
  • The previous _field_info_prepare_*() helper functions, that prepare a raw field / instance definition for the current runtime context (fill in default settings values, replace unavailable widgets and formatters with defaults) have been internalized as methods in the class. Their signature needed to change anyway (they received a full $field array, which is too costly to get, while they actually only need a field type). There might be valid edge cases for calling them from 3rd party, so the methods have been kept public, and the D7 patch can provide BC versions.
  • API addition : field_read_fields($conditions) which reads fields definitions from the {field_config} table, has been modified to accept $conditions on {field_config_instance} ('entity_type' and 'bundle' : "give me the fields that have an instance in this bundle"). That was needed so that field_info_instances($entity_type, $bundle) can pre-cache the fields involved in the bundle.

While the code changes are fairly important, the patch prioritized 1) D7 backportability, 2) overall soundness and consistency of the new shape of the code, in that order. No changes were made just for the sake of it if they weren't needed to solve the D7 performance issue, but then where changed were needed I tried to push the code logic and shape it in a way that makes sense for a "clean-slate" D8.
Future possible D8 evolutions ($field and instances as classed objects / WSCCI plugins, definitions read from CMI) might shuffle things around, but the FieldInfo class introduced here might well stay in one form or another (and I'd advocate getting rid of the field_info_*() wrappers, but that's for another patch).

Original report by catch

The _field_info_collate_fields() cache grows rapidly as soon as you start installing contrib modules.

With vanilla core, _field_info_collate_fields() uses around 140k of memory.

Do this:

drush en -y media media_gallery
The following extensions will be enabled: styles, file_styles, multiform, media, media_gallery

Now, that cache is over 1mb - just due to a few default fields, instances and/or formatters. I've already seen a Drupal 7 site where it was over 3mb.

This bit of code isn't all that different from the equivalent Drupal 6 CCK code, I found it using over 3.5mb of memory on a client site, see #1009596: _content_type_info() memory usage improvements for the CCK issue (and a patch).

#729186: _field_info_collate_fields() resource usage helped with the CPU issues in D7 but not really the memory, or if it did slightly it was only a side effect.

Now the CCK patch was easy, but there are sufficient changes in the D7 logic that I'm not sure the simple approach from there (basically split/duplicate the cache into the parent functions on the assumption they won't all be called on most requests) will cut it.

Attached a screenshot to show the usage anyway.

CommentFileSizeAuthor
#246 ab-scenario-1-without-patch.txt1.26 KBDavid_Rothstein
#246 ab-scenario-1-with-patch.txt1.26 KBDavid_Rothstein
#246 ab-scenario-2-without-patch.txt1.26 KBDavid_Rothstein
#246 ab-scenario-2-with-patch.txt1.26 KBDavid_Rothstein
#246 ab-scenario-3-without-patch.txt1.26 KBDavid_Rothstein
#246 ab-scenario-3-with-patch.txt1.26 KBDavid_Rothstein
#246 scenario_1.sh_.txt1.06 KBDavid_Rothstein
#246 scenario_2.sh_.txt1011 bytesDavid_Rothstein
#246 scenario_3.sh_.txt1.25 KBDavid_Rothstein
#246 system-init-field-info-functions-hack-do-not-test.patch599 bytesDavid_Rothstein
#241 field-info-cache-1040790-240-D7.patch60.37 KByched
#241 interdiff.txt1.57 KByched
#238 field-info-cache-1040790-238-D7.patch60.41 KBgeerlingguy
#238 interdiff.txt7.42 KBgeerlingguy
#219 patch_field_info_collate_fields.png23.9 KBmariusdiacu
#210 field-info-cache-1040790-210-D7.patch60.43 KBgeerlingguy
#208 field-info-cache-1040790-208-D7.patch38.7 KBgeerlingguy
#201 field-info-cache-1040790-201-D7.patch60.57 KByched
#201 interdiff.txt1.76 KByched
#200 field-info-cache-1040790-200-D7.patch59.6 KByched
#200 interdiff.txt4.84 KByched
#199 field-info-cache-1040790-199-D7.patch59.18 KByched
#196 field-info-cache-1040790-196-D7.patch59.64 KByched
#196 interdiff.txt11.27 KByched
#176 1040790-176.patch51.54 KBswentel
#172 1040790-170.patch51.54 KBswentel
#172 interdiff.txt2.83 KBswentel
#168 1040790-168.patch61.17 KBswentel
#166 1040790-166.patch60.73 KBswentel
#163 1040790-163-D8.patch52.41 KBswentel
#146 field-info-cache-1040790-145-D8.patch53.5 KByched
#143 field-info-cache-1040790-143-D7.patch55.28 KByched
#143 interdiff.txt12.56 KByched
#144 field-info-cache-1040790-144-D7.patch55.15 KByched
#144 interdiff.txt700 bytesyched
#139 field-info-cache-1040790-139-D7-do-not-test.patch51.33 KBgeerlingguy
#135 field-info-cache-performance-tests.txt769 bytesgeerlingguy
#131 field-info-cache-1040790-131-D7.patch51.33 KByched
#131 interdiff.txt5.56 KByched
#127 field-info-cache-1040790-126-D7.patch50.53 KByched
#127 interdiff.txt11.46 KByched
#126 field-info-cache-1040790-125-D7.patch50.53 KByched
#116 field-info-cache-1040790-116-D7-do-not-test.patch49.49 KByched
#116 interdiff.txt3.96 KByched
#115 field-info-cache-1040790-115-D7-do-not-test.patch49.46 KByched
#115 interdiff.txt3.94 KByched
#113 field-info-cache-1040790-113-D7-do-not-test.patch47.29 KByched
#113 interdiff.txt635 bytesyched
#103 field-info-cache-1040790-103.patch47.29 KByched
#100 field-info-cache-1040790-99.patch47.2 KBswentel
#98 field-info-cache-1040790-98.patch28.66 KBswentel
#96 field-info-cache-1040790-89-D7-do-not-test.patch46.83 KByched
#89 field-info-cache-1040790-89.patch47.09 KBgeerlingguy
#89 interdiff.txt4.79 KBgeerlingguy
#83 field-info-cache-1040790-83.patch47.11 KByched
#83 interdiff.txt1.77 KByched
#79 field-info-cache-1040790-79.patch47.36 KBBerdir
#79 field-info-cache-1040790-79-interdiff.txt3.35 KBBerdir
#76 field-info-cache-1040790-75.patch45.11 KBBerdir
#65 field-info-cache-1040790-65.patch44.88 KByched
#65 interdiff.txt8.15 KByched
#62 field-info-cache-1040790-62.patch44.55 KByched
#62 fieldinfotest.module.txt2.47 KByched
#57 field-info-cache-1040790-57.patch49.27 KByched
#46 field-info-cache-1040790-44.patch48.31 KBAnonymous (not verified)
#44 field-info-cache-1040790-43.patch48.31 KBAnonymous (not verified)
#41 field_info_cache-1040790-41.patch48.3 KByched
#40 field_info_cache-1040790-40.patch47.37 KByched
#36 field_info_cache-1040790-36.patch45.72 KByched
#32 field_info_cache-1040790-32.patch47.04 KByched
#30 field_info_cache-1040790-30.patch43.29 KByched
#27 field_info_cache-1040790-27.patch10.01 KByched
#8 with_field_info_field_cache_bin.JPG274.44 KBpodarok
#8 itself_with_field_info_field_cache_bin.JPG104.12 KBpodarok
#7 _field_info_prepare_instance.JPG69.93 KBpodarok
#7 full-run.jpg156.37 KBpodarok
Desk 1_027.png100.12 KBcatch
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

yched’s picture

One approach could be splitting static caches about instances info by entity type - possibly bundles as well, but less sure about that.

I don't see how we can minimize the 'fields' info size, though. Fields are potentially used in any entity type.

catch’s picture

With fields I'm thinking cache it by bundle too - based on the instances. This would be more cache_get() on pages that show entities with different bundles, but each one would be loading much less information. Also there will not be many pages that load every single bundle on the site - but if they did then memory usage could be even worse.

Whatever we do there's going to be some kind of trade-off.

Berdir’s picture

Subscribing.

One of the reasons for creating drupal.org/project/properties is that the entity/field configuration is global and can become huge if you are trying to create large amount of bundles and fields.

yched’s picture

andypost’s picture

Coming from #1165826 which still mot marked as duplicate, I think both should be fixed

field_info_fields() also CPU-eater because iterates all fields mostly without reason, suppose we need to store deleted items in separate cache

file_get_file_references() the only usage of field_info_fields() should be rewritten with more lightweight implementation of this

podarok’s picture

podarok’s picture

here are xhprof on real project with

mysql> select count(0) AS `COUNT(*)` from `field_config_instance`;
+----------+
| COUNT(*) |
+----------+
|    15968 |
+----------+
1 row in set (0.02 sec)

all the 15968 created by commerce BTW
this run was recorded in xhprof without field_info_field cache bin

podarok’s picture

here are output with cached field_info_fields

the same database and codebase

catch’s picture

I don't have any code here yet, but thinking about this more:

- if we want to split the cache up, we have a choice between per-entity type and per-bundle. Per bundle is going to be more efficient if you have rarely used bundles (like a 'legacy' content type) that have their own set of fields, however there's also more risk of duplication.

- field_info_field() could ideally share the same cache as the per-bundle/type caches for fields and bundles to start with, then fall through to the full list if it's not found there.

- this might be another place where using ArrayAccess would allow us to do fairly complex caching without having to spill it out all over field module, berdir has a git sandbox for the schema issue, I just posted a new patch to #1011614: Theme registry can grow too large for MySQL max_allowed_packet and memcache default slab size doing something similar.

yched’s picture

Wow. For now, I'll just say that the current code is indeed in no way designed for 16.000 different instances. I don't think Drupal Commerce handles hundreds of different entity types, so I'd more likely suspect they are creating bundles like mad ?

If that # of instances is "legitimate", then caching per-entity-type will hardly cut it.

Pondering on the items 2 and 3 in @catch's #9 (I don't think I get item 2)

podarok’s picture

2yched

I don't think Drupal Commerce handles hundreds of different entity types, so I'd more likely suspect they are creating bundles like mad ?

this is possibly offtopic, but in that project there are 800+ types with 20-40 fields in it
duplication of fields was minimized via using same fields(bundles) for different types(instances)
with those huge fields database (50GB +) I`ve got a 10 seconds trouble - any page that used fields displayed in 10 seconds 8(

If You are interesting in it here are few issues that become real when there are more than 10K bundles(fields)
#1173108: Permissions page performance + #1165826: field_info_fields cache workaround fail + #435694: >1M data writes incompatible with memcached (memcached -I 32M -m32) + #1131980: commerce_product_ui_menu performance fail

2catch
We have a thin place in fieldapi when getting info about all the fields for getting fields for a single entity or a bundle...
We need be more specific in that place - without caching all the data but caching a data at different levels
Storing field_info_fields in database is like a bomb - serialized 10M blob(~500 types) unserialized 100M
FE
xxx_get_all_fields - returns an array or an object with only field names at all (1000 fields will be few KB in a single cache blob)
xxx_get_field_info($field_id) - returns particulary the same as _field_info_collate_fields() but only for a single field (a small cache blob too)

Damien Tournoud’s picture

@podarok: you are trying to bend Drupal 7 outside of its design specifications. Drupal 7 assumes that *entity types, bundles and fields are configuration*, which means that they can be (and are) loaded into memory at every request. A system that scales to a large number of fields would be designed completely differently, and would likely have worse performance in the simple cases.

podarok’s picture

2Damien Tournoud
nope, I`m using really standard core D7 config.

I just use a simple online store configuration with a 800 product types, created in commerce with a ~30 fields for each

WIth xhprof I`ll found a problem with memory in _field_info_collate_fields() and here I am 8)
As described at #1 - memory trouble is even at small D7 installations and I thought my configuration will be helpfull for this issue
FieldAPI really good for storing and displaing such huge number of fields with a memory trouble for _field_info_collate_fields() workaround

If we fix it - D7 would be a good solution for enterprise systems like electronic, car parts catalogs (10K+ types with 50+ fields each) or any other

Berdir’s picture

@podarok

You misunderstood Damien. He meant that you are using Drupal 7 in a way for which it was not designed to be used.

Have you looked the Properties project that I linked above? ( http://drupal.org/project/properties ). One of the main reasons for creating that was that you practically *can not* create hundreds of field instances. It works around that by being a single field that has an more or less unlimited amount of key/value pairs, grouped in categories. It is limited to strings as values currently, so it doesn't work for anything but it allows to create any number of pre-defined templates, which are *not* read into memory.

The fields system in D7 simply has not been designed for what you are doing (creating 800 product types). I don't think that can simply "be fixed", certainly not in D7, as that requires architectural changes to really solve this properly.

yched’s picture

@podarok : everyone agrees there's a data size (memory + CPU) issue with _field_info_collate_fields(), and the discussion in this thread will hopefully come to enhance that (with the particular requirement that a solution for D7 needs to avoid breaking the field_info_*() API).

However, as for any performance and optimization issue, there needs to be a target as to what range of data size we should support with a "reasonable" performance, so that we can reason on the necessary tradeoffs. "800 bundles & 16.000 field instances" seems out of reach (and scope), and definitely not the main case we should try to optimize for.

catch’s picture

Version: 7.x-dev » 8.x-dev
Issue tags: +Needs backport to D7

@podarok, if you are using exactly the same set of fields for two different bundles of the same entity type, then that sounds wrong overall. It's likely the changes here would help your use-case too, but they might not make it workable with a small memory limit still.

I have the CacheArrayObject class at #1011614: Theme registry can grow too large for MySQL max_allowed_packet and memcache default slab size working well now, it would be possible to start work on a patch here based on that API (although it may well change before that patch gets anywhere near commit).

Some thoughts, I need to review the order in which the various field_info_*() functions get called on most requests, but iirc the instance functions tend to get called, then the field functions are called usually for fields that are already somewhat known about from instances. If this is correct:

_field_info_collate_fields() should switch to returning a CacheArrayObject. It's a private function so that's fine.

Try to allow field_info_instances() and field_info_instance() to then trigger loading instances (+ possbily the field definitions themselves for those instances) cached per bundle. The instances and fields would be stored for later retrieval during the same request in the CacheArrayObject.

If we stored both instances and fields per-bundle, then field_info_instances($entity_type, $bundle); _field_info_collate_fields() will load all the instance and field definitions for that bundle into memory. If I later run field_info_field(), it can re-use that same data. This means one cache request per bundle per page. We might load duplicate data, but the field info array inside the CacheArrayObject can be unique, so it won't be duplicated in the static at least.

field_info_field() could then consult the CacheArrayObject() to see if the field definition was already loaded, if not fall through to either the full field list, or an individual cache entry. This assumes that most of the time it's called when there's already an instance known about, which may not be the case.

functions that need to return a huge list of fields or bundles or instances across all entity types to the caller we can't do much about, but this won't happen on most requests.

podarok’s picture

2catch

if you are using exactly the same set of fields for two different bundles of the same entity type

nope
Every content type (bundle) has its own set of fields, but some fields are common.
but anyway... it is possibly offtopic...

yched’s picture

re @catch #16 :
I'd say the assumptions and the approach look correct.

Below is a a small audit of the use of the field_info_*() API functions (at least those that fetch from _field_info_collate_fields()), with core + a couple 'big' contrib suspects (Views, Ctools, OG, Field permissions, Field collection, OG, Date, References, Relation)
Disclaimer, though : the Entity API module does advanced (and convoluted) stuff with fields (entity metadata et al), I'm not familiar enough with it to evaluate the performance impact over there. We wight want to bring @fago in here.
I also didn't dare looking through Drupal Commerce :-/

field_info_field($field_name) / field_info_field_by_id($field_id)
Those are generally called after knowing an instance (sometimes collected higher in the callstack, no problem). So having instances pre-fetch their fields sounds like a good pattern.
For other cases (code inquiring about a specific field, e.g "field settings page for field_foo", tests, or views field handlers, that generally deal with a field across bundles), that will mean a possible extra db hit, should be OK.

field_info_fields() / field_info_field_by_ids()
Then we have the "return all fields" functions : .
We'll need to load every field definition, but we should probably make sure those do not remain in the persistent "on-demand" cache (and possibly not in the static cache as well ?)
field_info_field_by_ids() is mainly used internally, and each time the code could use field_info_field_by_id($id) instead. We can keep the function around, but our current core use cases should be moved to field_info_field_by_id().

field_info_instance($entity_type, $field_name, $bundle_name)
Perfect match for filling the static and persistent "on-demand" caches, and pre-fetching the corresponding $fields. We can probably load all fields and instances for the whole bundle while we're at it.

field_info_instances($entity_type = NULL, $bundle_name = NULL)
That one is a bit special, because of the two optional params.
- field_info_instances(NULL, NULL), like field_info_fields(), needs to load all definitions, and should probably not keep anything in the persistent cache (nor in the static ? in which case there's no need to pre-fetch the corresponding $fields)
- Same for field_info_instances($entity_type, NULL) (very rare).
- field_info_instances($entity_type, $bundle_name), like field_info_instance(), is a perfect match.

yched’s picture

Also : I'm leaving tomorrow for a whole month away from my coding env. Not completely AFK, but I won't be able to help much code-wide :-/.

omercioglu’s picture

xjm’s picture

Crossposting #1220168: Standardize information returned by _field_info_collate_fields() storage bins for consideration in whatever fix is implemented.

catch’s picture

Issue tags: +blocked

This is waiting for #402896: Introduce DrupalCacheArray and use it for drupal_get_schema(). I already have 3-4 patches in the queue which are duplicating code from that issue and don't want to add any more, so I'm tagging this as blocked.

catch’s picture

Issue tags: -blocked

And now it's not!

yched’s picture

Started to look into this now that DrupalCacheArray is in.

I'm actually not sure how we should best leverage the tool here :
- One single DCA object for both fields and instances across all entity types and bundles (filled on demand for each separate entity_type+bundle) ? - means one single db cache entry, which can still theoretically grow huge and hit MySQL max_allowed_packet limit ?
- Two separate DCAs for fields and instances ?
- Different DCAs for each entity_type+bundle ?

catch’s picture

I think we want a cache entry per bundle including both the instances and the field info triggered by field_info_instance()

I don't know if we want to make those individual entries DrupalCacheArray or not - depends if there is field and instance information that might not get requested I guess.

We then need to figure out how that interacts with requests for field info - it may be the fields are a DCA and this checks for field information that was pulled out for bundles, then falls back to its own cache (or the db) if not. That bit is going to be tricky I think.

There's also situations like when all instances get returned or all fields. We might want to not cache those calls at all (pull from the db and call alter hooks) since that should be rare. Eventually we could look at ArrayAccess + Iterator for things like that - started work on this at #853864: views_get_default_view() - race conditions and memory usage.

It may be we just end up using ArrayAccess for part of this since it's not going fit neatly in to the pattern used for schema and theme registry. Will try to have a more in-depth look at this myself in the next couple of days.

yched’s picture

I started to work on some code myself, not nearly finished, but I will try to post it later today.

yched’s picture

Status: Active » Needs work
FileSize
10.01 KB

Posting what I have. This is still largely rough code, unpolished and unfinished - notably, not tied with the actual field_info_*() functions yet.

The FieldInfoFieldCache class extends DrupalCacheArray for now, but uses a different logic around the persistent and static caches (one DCA but several cache entries, static data stored not only in $this->storage, and massaged before returning the accessed values), so that in the end it has little in common. At some point it might be clearer to drop the reference to DCA completely, not sure yet.

Guiding principles :
- We want separate cache entries for each bundle, to keep them to reasonable sizes. Only load / unserialize / keep in memory the data for the bundles used in the page. That's one cache_get() per bundle used on the page, I think this is a fair trade_off.
- For each bundle, the persistent cache stores the list of instances, and the list of corresponding fields. Bundles are populated 'as a whole' : all instances and fields for the bundle; I don't think it makes sense to have on-demand loading of specific fields or instances within a given bundle.
- The static per-bundle entries ($this->storage["$entity_type:$bundle"]) do not hold field definitions, but only a list of field ids, to avoid in-memory duplication of fields shared in several bundles. When the persistent cache entry for the bundle is loaded in memory (in offsetGet()), the field definitions are taken out and merged in a global list of fields ($this->fields_by_id)
- When per-bundle entries are accessed (offsetGet()) or persisted (__destruct()), the relevant field definitions are merged back into the returned value.

bibo’s picture

Subscribing.

yched’s picture

For the record, I'm still working on this, but I've had limited core time recently. I'm currently working in #1305362: Helper issue for #1040790 to have a testbot playground for the work-in-progress patch without cluttering this issue.

Still a fair amount of code polish ahead when the tests over there come back green, though.

yched’s picture

Status: Needs work » Needs review
FileSize
43.29 KB

Still some polish needed, and a couple @todos, but we should be getting close.
I'll try to post a human-reviewable patch with a patch summary shortly.

Status: Needs review » Needs work

The last submitted patch, field_info_cache-1040790-30.patch, failed testing.

yched’s picture

Here's a green and almost final patch (still has a couple docs @todos).
I'll try to wrap a summary asap.

yched’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, field_info_cache-1040790-32.patch, failed testing.

Berdir’s picture

Interesting.

Is this still applicable for a D7 backport? The functions which are moved around are prefixed with an _, sure, but this is quite a huge change...

yched’s picture

Status: Needs work » Needs review
FileSize
45.72 KB

So much for "here's a green patch". For some reason patch #32 included a revert of my #1391384: Wrong static clear on taxonomy_vocabulary_save() fix and thus failed the tests I added over there. That's called self-biting I guess...

Updated patch fixes that and clears a couple doc @todos.

@Berdir: yes, this is definitely done with a D7 backport as a target.
The changes on existing functions only involve a couple _internal_funcs(). For some of them (field_info_prepare_*()) I suspect there might be legitimate cases for calling them from contrib, but the D7 version of the patch can preserve BC implementations.

xjm’s picture

Status: Needs review » Needs work
Issue tags: +Novice

Wow, big patch! It will be great to see this get in. Somewhere in the middle I switched from reading the code to just doing a docs review, which follows.

  1. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    + * The FieldInfo class serves informations about the active field and instance
    + * definitions in the current runtime environment.
    

    This should be one line, 80 chars or less, and start with a third-person verb. Also, "information" is not pluralized. How about simply:

    Provides information about active field and instance definitions.

  2. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    + * It holds static and persistent caches of the collected data. The methods ¶
    + * retrieving all fields or all instances across all bundles do not read from ¶
    + * the caches (and do not populate the caches either). As such, they should be ¶
    + * used sparingly.
    

    Trailing whitespace at the end of these lines. Also, because of the above change, I'd start the function either with "Holds..." or "This class holds..."

  3. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    + * cache entries (the static cache ). Cache entries are loaded for
    

    Unneeded space before the parenthesis here. Also, I think we can fit more words on this line?

  4. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * Lightweight map of fields across entity types and bundles,
    

    Typo here (ends with a comma instead of a period).

  5. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * Information about the contents of bundles ($instance definitions and
    +   * extra fields), keyed by entity type and bundle name.
    

    We should try to word this such that it fits in one line.

  6. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * Constructs a.FieldInfo object.
    

    I think there is a typo here (a period instead of a space).

  7. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +  protected function persistentCacheGetBundleInfo($entity_type, $bundle) {
    +    if ($cached = cache('field')->get("bundle_info:$entity_type:$bundle")) {
    +      return $cached->data;
    +    }
    +  }
    +  ¶
    +  /**
    

    Trailing whitespace after this method.

  8. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * This method does not read from nor populates the static and persistent
    +   * caches.
    ...
    +   * This method does not read from nor populates the static and persistent
    +   * caches.
    ...
    +   * This method does not read from nor populates the static and persistent
    +   * caches.
    

    These should read:
    "does not read from nor populate..."

  9. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * Returns a field definition from a field id.
    ...
    +   *   The field id.
    

    "ID" should be uppercase in both these lines.

  10. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * Ratrieves all active, non deleted instances definitions.
    

    "Non-deleted" should be hyphenated. Edit: Also, there's a typo ("Ratrieves").

  11. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +   * @return
    +   *   If $entity_type is not set, return all instances keyed by entity type and
    +   *   bundle name. If $entity_type is set, return all instances for that entity
    +   *   type, keyed by bundle name.
    

    We can remove the word "return" here.

  12. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +    // We need the field type for each instance. Rather than loading all field
    +    // definitions in memory, retrieve field types with an ad-hoc query.
    

    Do we really need to call the query "ad-hoc"? :P

  13. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +  /**
    +   * @todo
    +   *
    +   * @param $entity_type
    +   *   The entity type.
    +   * @param $bundle
    +   *   The bundle name.
    +   * ¶
    +   * @return
    +   *   An array with the following key/value pairs:
    +   *   - 'instances': the list of instance definitions.
    +   *   - 'extra_fields': the list extra fields.
    +   *   The function also populates the correpsonding field definitions in the
    +   *   static cache.
    +   */
    
    1. Let's add summaries for all these @todo.
    2. There's a bit of trailing whitespace between the params and return value.
    3. We don't need quotes around the names of the array keys. Also, I think the "The" after the colons should be capitalized.
    4. Additionally, let's move the line about populating the cache up to a 2nd paragraph (after the summary).
  14. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +      $this->staticCacheSetBundleInfo($entity_type, $bundle, $info);
    +      return $info;
    +    }
    +    ¶
    +    // Cache miss: collect the information from the database.
    

    Trailing whitespace here.

  15. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +      // The persistent cache additionally contains the definitions of the
    +      // fields involved in the bundle.
    +      $cache = $info + array('fields' => array());
    +      foreach ($info['instances'] as $instance) {
    +        $cache['fields'][] = $this->staticCacheGetFieldbyId($instance['field_id']);
    +      }
    +      $this->persistentCacheSetBundleInfo($entity_type, $bundle, $cache);
    +    }
    +    ¶
    +    return $info;
    +  }
    

    Trailing whitespace just above the return value.

  16. +++ b/core/modules/field/field.info.incundefined
    @@ -6,6 +6,681 @@
    +      // Fallback to default formatter if formatter type is not available.
    

    REALLY nitpicky point, but "Fall back" should be two words when it's a verb.

Phew! :)

Adding a novice tag in case someone wants to clean up the points listed above and write one-line summaries where there are currently @todo before others get back to it.

xjm’s picture

Status: Needs work » Needs review

So I didn't mean to mark this NW actually. Conversation in IRC about how dreditor sets it to NW reminded me. :) Sorry.

Anonymous’s picture

will run some benchmarks and a do review of this tomorrow - nice work!

yched’s picture

Fixes @xjm's remarks in #37.

yched’s picture

Some more refactoring.
Moving the static and persistent cache gets and sets to separate helper methods really helped make the code flow legible in the rest of the class, but the method names themselves sucked big time (staticCacheGetFieldbyId(), persistentCacheSetBundleInfo()...).

Factorized those into cacheGetX() / cacheSetY() methods, with CACHE_STATIC and CACHE_PERSISTENT class constants as params.

Anonymous’s picture

Status: Needs review » Needs work

this fatal errors for me on install. but the bot green-lights it. ruh roh.

Fatal error: Undefined class constant 'CACHE_PERSISTENT' in /var/www/drupal8/webroot/core/modules/field/field.info.inc on line 258

will see if i can fix that.

xjm’s picture

Issue tags: -Novice

Untagging.

Anonymous’s picture

Status: Needs work » Needs review
Issue tags: +Novice
FileSize
48.31 KB

attached patch fixes the PERISTENT issue - was just a typo.

only change vs 41 is s/PERISTENT/PERSISTENT/.

Anonymous’s picture

Issue tags: -Novice
Anonymous’s picture

ok, another fatal, sacheSetFieldMap(), fixed with attached patch.

+    if ($map = $this->cacheGetFieldMap(self::CACHE_PERSISTENT)) {
+      $this->sacheSetFieldMap(self::CACHE_STATIC, $map);
+      return $map;
+    }
yched’s picture

Oops, #41 came back green with 0 passes, which gets easily overlooked. Sorry about the typos.

Anonymous’s picture

#47 no worries, this is awesome work.

taking me a while to work through this patch, because i still need to get up to speed on the way this data changes and is queried.

yched’s picture

Issue summary: View changes

Patch summary.

yched’s picture

Issue summary: View changes

added API addition on field_read_fields()

yched’s picture

Added an issue summary.

yched’s picture

Issue summary: View changes

formatting

yched’s picture

bump - anyone up for review ?

drzraf’s picture

I would be happy to test this if there were a "light" way to consistently measure the memory consumption.
Something like drush ev '[RAM usage]; _field_info_collate_fields(); [RAM usage]' or something using mysqldump db {cache_field}|wc -c, so we can do several tests easily before and after the patch.

podarok’s picture

#46

clean drupal 8.x today git clone
with manually created module for filling one entity_type 'node' , bundle 'test_type', with 1000 fields(instances)

# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1013
Server version: 5.5.20-log Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use d8
Database changed
mysql> select count(*) from field_config_instance;
+----------+
| count(*) |
+----------+
|     1009 |
+----------+
1 row in set (0.00 sec)

all tested on freebsd 9.0-RELEASE-amd64bit Core2Quad Q9300 CPU + 6GB RAM +
apache 2.2.21
php 5.3.9 (all dynamic extensions)
mysql 5.5.20
all 64bit builds
no opcode cachers

without patch #46

truncated cache tables run

xhprof data

Overall Summary
Total Incl. Wall Time (microsec): 3,144,291 microsecs
Total Incl. CPU (microsecs): 1,666,097 microsecs
Total Incl. MemUse (bytes): 69,812,944 bytes
Total Incl. PeakMemUse (bytes): 78,344,640 bytes
Number of Function Calls: 141,986

cached run


Overall Summary
Total Incl. Wall Time (microsec): 656,892 microsecs
Total Incl. CPU (microsecs): 342,460 microsecs
Total Incl. MemUse (bytes): 41,655,360 bytes
Total Incl. PeakMemUse (bytes): 48,087,848 bytes
Number of Function Calls: 17,281

with patch #46

truncated cache tables run


Overall Summary
Total Incl. Wall Time (microsec): 3,398,989 microsecs 8,1% lost
Total Incl. CPU (microsecs): 1,630,657 microsecs 2,2% won
Total Incl. MemUse (bytes): 69,860,680 bytes 0,1% lost
Total Incl. PeakMemUse (bytes): 76,471,984 bytes 2,4% won
Number of Function Calls: 146,831 3,41% lost

cached run


Overall Summary
Total Incl. Wall Time (microsec): 728,249 microsecs 10.9% lost
Total Incl. CPU (microsecs): 340,286 microsecs 0,6% won
Total Incl. MemUse (bytes): 41,559,464 bytes 0,2% won
Total Incl. PeakMemUse (bytes): 47,992,400 bytes 0,2% won
Number of Function Calls: 18,288 5,8% lost

for the bottom results i`d use most repeatable result from ~5-10 page runs

apachebench for a custom page


function pfield_menu(){
$items = array();

  $items['pfield'] = array(
    'title' => 'Pfield test',
    'page callback' => 'pfield_form',
    'access arguments' => array('access devel information'),
  );

return $items;
}

function pfield_form($form, &$form_state){
$form = array();
$mem = field_info_instances('node', 'test_type'); // just for test huge array from 1000+ fields
                                            
$form['pfield_test'] = array(
 '#markup' => '<pre>'.print_r('hello', TRUE).'</pre>'
);
return $form
 

without patch


# ab -c5 -n1000 "http://127.0.0.1/drupal/?q=pfield"
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Server Software: Apache/2.2.21
Server Hostname: 127.0.0.1
Server Port: 80

Document Path: /drupal/?q=pfield
Document Length: 1648597 bytes

Concurrency Level: 5
Time taken for tests: 171.750 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 1649080000 bytes
HTML transferred: 1648597000 bytes
Requests per second: 5.82 [#/sec] (mean)
Time per request: 858.750 [ms] (mean)
Time per request: 171.750 [ms] (mean, across all concurrent requests)
Transfer rate: 9376.59 [Kbytes/sec] received

with patch


# ab -c5 -n1000 "http://127.0.0.1/drupal/?q=pfield"
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Server Software: Apache/2.2.21
Server Hostname: 127.0.0.1
Server Port: 80

Document Path: /drupal/?q=pfield
Document Length: 1648597 bytes

Concurrency Level: 5
Time taken for tests: 168.670 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 1649080000 bytes
HTML transferred: 1648597000 bytes
Requests per second: 5.93 [#/sec] (mean) (in other run got 5.7#/sec :( )
Time per request: 843.349 [ms] (mean)
Time per request: 168.670 [ms] (mean, across all concurrent requests)
Transfer rate: 9547.83 [Kbytes/sec] received

~ 1.8% speedup at all
I do not see any speed/memory optimization here
Or possibly show me right way to test it

catch’s picture

filling one entity_type 'node' , bundle 'test_type', with 1000 fields(instances)

You won't see an improvement if you add 1,000 instances to a single bundle. A better data set for testing this (with the same number of overall bundles) would be:

- ten bundles
- 100 instances per bundle
- load instance info for three of these bundles during the request.

podarok’s picture

Ok @catch

I`ll try to test such config

podarok’s picture

Status: Needs review » Reviewed & tested by the community

ok...
another test

14 bundles of node entity
10 of em with 100 fields

function pfield_form($form, &$form_state){
$form = array();
$mem = field_info_instances('node', 'test_1');
$mem2 = field_info_instances('node', 'test_2');
$mem3 = field_info_instances('node', 'test_3');

$form['pfield_test'] = array(
 '#markup' => '<pre>'.print_r($mem3, TRUE).'</pre>'
);

/*
// the code for bundle and field generation
 $num_bundles = 10;
 $num_fields = 100;

for ($y = 0; $y < $num_bundles; $y++) {
field_attach_create_bundle('node', 'test_'.$y);
for ($i = 0; $i < $num_fields; $i++) {

$field = array(
 'field_name' => 'pfield'.$i,
 'type' => 'text',
);

field_create_field($field);

$instance = array(
 'field_name' => 'pfield'.$i,
 'entity_type' => 'node',
 'bundle' => 'test_'.$y,
 'label' => 'pfield_label'.$i,
 'widget' => array('type' => 'text_textfield', 'settings' => array('size' => 60)),
 'display' => array(
    'default' => array(
       'label' => 'above', 'type' => 'text_default', 'settings' => array())),
);
 field_create_instance($instance);
}
}
*/

return $form;
}

without patch

xhprof

Overall Summary
Total Incl. Wall Time (microsec):	318,519 microsecs
Total Incl. CPU (microsecs):	280,442 microsecs
Total Incl. MemUse (bytes):	43,985,304 bytes
Total Incl. PeakMemUse (bytes):	45,825,528 bytes
Number of Function Calls:	15,132

apachebench

# ab -c5 -n1000 "http://127.0.0.1/drupal/?q=pfield"
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        Apache/2.2.21
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /drupal/?q=pfield
Document Length:        168282 bytes

Concurrency Level:      5
Time taken for tests:   143.457 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      168765000 bytes
HTML transferred:       168282000 bytes
Requests per second:    6.97 [#/sec] (mean)
Time per request:       717.285 [ms] (mean)
Time per request:       143.457 [ms] (mean, across all concurrent requests)
Transfer rate:          1148.84 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      18
Processing:   452  716 293.0    637    3088
Waiting:      293  535 283.1    458    2926
Total:        452  716 292.9    637    3088

Percentage of the requests served within a certain time (ms)
  50%    637
  66%    754
  75%    821
  80%    878
  90%   1060
  95%   1303
  98%   1566
  99%   1671
 100%   3088 (longest request)

with patch

xhprof output

Overall Summary
Total Incl. Wall Time (microsec):	361,425 microsecs 13,4% lost
Total Incl. CPU (microsecs):	315,218 microsecs 12,4% lost
Total Incl. MemUse (bytes):	21,415,336 bytes 51.3% won
Total Incl. PeakMemUse (bytes):	22,897,072 bytes 50% won
Number of Function Calls:	34,433 127% lost

apachebench

# ab -c5 -n1000 "http://127.0.0.1/drupal/?q=pfield"
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        Apache/2.2.21
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /drupal/?q=pfield
Document Length:        168282 bytes

Concurrency Level:      5
Time taken for tests:   144.301 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      168765000 bytes
HTML transferred:       168282000 bytes
Requests per second:    6.93 [#/sec] (mean)
Time per request:       721.507 [ms] (mean)
Time per request:       144.301 [ms] (mean, across all concurrent requests)
Transfer rate:          1142.12 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      10
Processing:   441  720 375.7    584    3097
Waiting:      313  577 363.7    439    2762
Total:        441  720 375.7    584    3097

Percentage of the requests served within a certain time (ms)
  50%    584
  66%    741
  75%    816
  80%    899
  90%   1152
  95%   1486
  98%   2021
  99%   2261
 100%   3097 (longest request)

looks like this patch is a good memory optimization fix
possibly we can also do some work for multiple function call optimization and get a more nice code execution

but anyway 50% memory optimization is good enouph for making it RTBC

catch’s picture

Status: Reviewed & tested by the community » Needs work

OK this looks great overall. I didn't do an in depth review yet so I may have missed finer points of the logic, but generally caching by bundle, and not caching when functions are trying to get everything all at once looks great.

A few things that came up reviewing:

+++ b/core/modules/field/field.api.phpundefined
@@ -1647,11 +1647,14 @@ function hook_field_storage_details_alter(&$details, $field) {
+    // By the time hook_field_storage_load() runs, the relevant fields have

Could we use the class name rather than 'static cache' here? This also looks more like it's commenting on the lines changed in the patch than recommending what to do, since it's hook documentation, could we put the performance advice in the phpdoc instead?

+++ b/core/modules/field/field.info.incundefined
@@ -6,6 +6,730 @@
+  /**
+   * Retrieves a field definition from cache.
+   *
+   * @param $cache
+   *   This parameter is only present for consistency and is not actually used.
+   *   Only self::CACHE_STATIC is implemented, fields only enter the persistent

Hmm I'm not sure about this, does it really need to be consistent? If the problem is having $cache as the first parameter to everything else, what about swapping the parameter order?

+++ b/core/modules/field/field.info.incundefined
@@ -6,6 +6,730 @@
+   * @return
+   *   The contents of the bundle, as returned by getBundleInfo(), if present in

This returns if it's from the persistent cache too?

+++ b/core/modules/field/field.info.incundefined
@@ -6,6 +6,730 @@
+   * Stores the contents of a bundle in the static cache.
+   *
+   * @param $cache

Again this is either persistent or static so the docs should cover both.

+++ b/core/modules/field/field.info.incundefined
@@ -6,6 +6,730 @@
+      // Save in the static cache (fields only enter the persistent cache as
+      // part of a full bundle).

Looking at this, I'm wondering if it wouldn't be easier to have separate methods for persistent and static cahing. So $this->staticCacheSetField() and $this->persistentCacheSetField() or similar.

+++ b/core/modules/field/field.info.incundefined
@@ -6,6 +6,730 @@
+   * The function also populates the correpsonding field definitions in the

Typo for 'corresponding'.

+++ b/core/modules/field_ui/field_ui.moduleundefined
@@ -344,7 +344,10 @@ function field_ui_inactive_instances($entity_type, $bundle_name = NULL) {
-  $active_instances = field_info_instances($entity_type);
+  $active_instances = field_info_instances($entity_type, $bundle_name);
+  if (!empty($bundle_name)) {
+    $active_instances = array($bundle_name => $active_instances);

Is this only for performance, or is there an API change?

+++ b/core/modules/field_ui/field_ui.testundefined
@@ -298,7 +298,7 @@ class FieldUIManageFieldsTestCase extends FieldUITestCase {
-    _field_info_collate_fields_reset();

For Drupal 7 we might possibly want to keep the old function and wrap the new one?

yched’s picture

Many thanks for the profiling data, @podarok.

Attached patch fixes @catch's remarks, except those on cacheGetX() / cacheSetX() methods.

As I wrote earlier, I wasn't fully satisfied with the code legibility until I moved the caching logic in dedicated methods, but @podarok's numbers in #55 make me reconsider, since this adds function calls in the critical path.
With a primed static cache, field_info_instance($entity_type, $field_name, $bundle) makes :
- 1 nested function call in current D7/D8 : _field_info_collate_fields()
- 3 nested function calls with the patch : _field_info_field_cache()->getBundleInfo(), which calls cacheGetBundleInfo().

Moving from a giant statically cached array to a class with methods means we get two func calls (get the object, call a method) instead of one (get the array, read the part we need), but we should probably stop there and favor performance over legibility.

Will try to inline the cache methods back and see how this translates in XHProf.

- re: field_ui_inactive_instances() : yes, that's performance only, no API change. The current code calls the expensive field_info_instances($entity_type) in all cases, even when a $bundle_name is provided. My take is we could drop support for $bundle_name = NULL, but I don't want to do this in this patch. New patch makes the code in there more explicit and adds comments.

- re: _field_info_collate_fields_reset() : sure, this could be kept as a BC function in D7. I keep it removed in the current D8 patch.

podarok’s picture

Status: Needs work » Needs review
podarok’s picture

looks like testing bot happy, so will try xhprof results in 24 hours against #55

yched’s picture

@podarok : #57 has no real changes that would deserve another round of profiling.

I want to explore a bit the impact of cacheGetX()/cacheSetX() methods, I'll do that hopefully tonight or tomorrow.

podarok’s picture

ok, following up

yched’s picture

So after spending some time with XHProf and ab, turns out the cacheGetX()/cacheSetY() methods add like 3% CPU overhead compared to when all caching logic is inlined. So I removed them in favor of the inlined version. A bit less friendly to read (see the getBundleInfo() method), but 3% is worthy.

New patch attached.

My setup (test code attached as well) :
20 node types
On each node type, 20 text fields :
- 10 fields shared across all node types
- 10 fields specific to the node type
That's 210 fields, 400 instances across 20 bundles. A "medium+" sized site.
+ devel_generate to create 100 nodes with dummy field data.

Memory measured with XHProf, CPU measured with "ab -n 300 -c 1"

1st test : Display all the fields on one entity (aka node/%nid)
- 8.x:
Requests per second: 13.80/s
Time per request: 72.441 ms (s/d 5.0)
Total Incl. PeakMemUse 15,662,856 bytes
- patch:
Requests per second: 14.51/s
Time per request: 68.941 ms (s/d 4.5) --> CPU -4,83%
Total Incl. PeakMemUse 8,384,776 bytes --> Memory -46,5%

2nd test: Display 10 entities (all fields) across 10 different bundles (aka "worst-case listing page")
- 8.x:
Requests per second: 6.61/s
Time per request: 151.173 ms (s/d 5.6)
Total Incl. PeakMemUse 17,905,632 bytes
- patch:
Requests per second: 6.64/s
Time per request: 150.651 ms (s/d 4.5) --> CPU -0,35%
Total Incl. PeakMemUse 13,871,712 bytes --> Memory -22,5%

catch’s picture

This looks good to me, I don't think losing the methods is too bad for readability.

Minor issues:

- the 'static cache' is just a class property since the class itself is held in a static, we should call it something else.

- There's some very minor whitespace things like extra blank lines.

Wondering if beejeebus got any further with his review?

yched’s picture

the 'static cache' is just a class property since the class itself is held in a static, we should call it something else

Dang. You puzzle me, actually. I'll try to see if I can make things clearer in the class docblock, but I'd really prefer to keep the expression "static cache" in the code comments within the methods.

Even though not directly based on PHP static vars, conceptually those properties really act as a static, 'request only' cache as opposed to data read from the persistent cache bin. The various methods are just implementations of the common pattern present all over core :

if (value present in static cache) {
  return value;
} 
if (value present in persistent cache) {
  save in static cache;
  return value;
}
build value from scratch;
save in static and persistent caches;
return value;

For efficiency I removed the explicit getCache()/setCache() methods which made the pattern very explicit in code, but I tried to keep it clear in code comments. If we can't use the words 'static cache' there while it's the expression used throughout all of core for this pattern, then all I'm left with is some fairly cryptic code :-(.

[dishonest mode] The rest of core says "static cache" but doesn't directly use PHP static vars either, they use a userland construct based on a function that "happens" to have the word 'static' in its name :-p. [/dishonest mode]

yched’s picture

What about this ?
- tried to make the phpdoc block for the class clearer
- added "quotes" around the term 'static' in code comments.
+ additionally fixes extraneaous arg passed to hook_field_storage_details_alter()

Interdiff attached.

podarok’s picture

#65
what about testing ab -c5 ?
today we mostly have a multicore environments and c5 is more realistic in results

catch’s picture

Thanks yched, those comments look better to me.

I don't have any more complaints here and the performance numbers look great. Given this has been profiled already I don't see a need to do benchmarks - trust xhprof over ab any day.

yched’s picture

Given this has been profiled already I don't see a need to do benchmarks - trust xhprof over ab any day.

actually, in #62, CPU measurements come from ab, memory measurements come from xhprof. I couldn't get xhprof to produce somewhat consistent and reproducible cpu ratios between runs, whereas the ab processing times over 300 runs were fairly stable.

yched’s picture

@catch : what is missing before we can get this in ?

catch’s picture

Status: Needs review » Reviewed & tested by the community

I was hoping someone else would rtbc it, but let's try that and see if any additional reviewers show up, if not I'll commit soon.

sun’s picture

I didn't test this patch and I'm not able to comment on the functional code changes at this point.

However, I'd like to see the phpDoc warnings/notes to be shortened and clarified, as it seems that it's important that developers understand the implications of using whichever functions quickly and crystal-clearly.

To achieve that, put the most important info upfront. Skip the preceding "Warning:" or "Note:" hint. Since all of the functions seem to be cached, but only some retrieve a particular key/item of the cache, whereas the others can lead to memory issues, it's a bit hard to distill the important bits, but the docs should really try hard to communicate that clearly and quickly.

I'm also missing concrete pointers or examples for when it is appropriate to use the memory-hog-functions. It sounds like that's discouraged and almost not supported anymore (which would make sense to me), but the functions are still around.

E.g.:

 * Returns [blub blaah].
 *
 * Retrieves the requested field only.

and

 * Returns all field definitions.
 *
 * Retrieves all defined fields. Use field_info_field() instead, if possible.
yched’s picture

Status: Reviewed & tested by the community » Needs work

I'll try to address @sun's remarks tonight.

catch’s picture

Issue tags: +Novice

Tagging this as novice for the comment changes only, see #71 for what needs to change relative to the current patch.

Berdir’s picture

Issue tags: -Novice

Removing the Novice tag, I don't see this as a Novice tag just because it's only about comments :) #71 merely contains an example and basically says "Improve the documentation" and for that you need to understand the code, and that isn't that easy.

geerlingguy’s picture

Subscribing, and adding that I have two sites now where this issue has reared its ugly head; one community-based site lets people set up 'networks' with profile2 profiles (for each network), and people can put any number of fields into their profiles... this means that we already have a few thousand field instances, and that site's field_info_field blob is 6.1MB.

I've checked through the patch, and code-wise/functionality wise it looks like it's good. Haven't tested it, but will try to review docs and submit a new patch if I can get a few spare moments. (Right now, this issue is pushing cache_clear's on one of the two sites over our memory limit, and causing some other minor performance issues, so it's in our best interest to get this backported too :).

Berdir’s picture

Status: Needs work » Needs review
FileSize
45.11 KB

Attaching a re-roll of the patch from #65, no actual changes yet.

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-75.patch, failed testing.

pounard’s picture

Why is there an init() method on the FieldInfo class? You could just write protected $fieldMap = array(); (default property values) in class properties declarations, and get rid of both the constructor and the init method, this would be more CPU wise and code would be shorter.

Instead of isset() on the static cache arrays you could use array_key_exists(), this ensures you would not retry a second cache load in some cases, where the key would be here but could be NULL due to a previous failure; Another solution is to set an explicit FALSE on every miss (non existing field id or name or whatever) in order to ensure that isset() would return TRUE for non existing identifiers (if you don't like array_key_exists(), but that's a matter of taste).

EDIT: The previous paragraph is important, because I'm afraid that unstored invalid identifier could cause the underlaying cache get attempts and framework to do unwanted extra SQL or cache backend queries.

Berdir’s picture

Status: Needs work » Needs review
FileSize
3.35 KB
47.36 KB

This should pass the tests, not sure if these two assertions are new or why they fail now. It is a change because previously, we returned "empty" bundles (= no fields) and now we don't do that anymore. Not sure if we that's ok.

I also moved the FieldInfo class to PSR-0 and tried to address the problem with trying to access missing fields. Not sure if that could result in problems because it worked differently previously, for example code that creates a field if it doesn't exist yet and runs multiple times, e.g. for multiple bundles. No idea if the caches are correctly cleared when the field is actually created. Let's see if there any test fails because of this

I also have no clue how this overlaps with the new Field API, any chance @yched or others working on this can check that?

Attaching an interdiff for the changes suggested in #78, there are a few other changes and the move that's not in there, because that makes the interdiff useless (moving and changing code at the same time).

yched’s picture

@Berdir : thanks for grabbing the task.
I'll try to follow up with a review later today - meanwhile, if that's ok with you, I'll upload the patch in a branch in the new "D8 Field API" sandbox we opened for the we're currently running sprint.

Berdir’s picture

@yched: Sure, that's ok.

yched’s picture

yched’s picture

Status: Needs review » Needs work
FileSize
1.77 KB
47.11 KB

Reviewed Berdir's changes in #79, adjusted some of them.
See attached interdiff.

+++ b/core/modules/field/lib/Drupal/field/FieldInfo.phpundefined
@@ -213,7 +201,7 @@ class FieldInfo {
+    if (isset($this->fieldIdsByName[$field_name]) || array_key_exists($field_name, $this->fieldIdsByName)) {

I don't think the array_key_exists call is needed here, because we never store NULL in fieldIdsByName, we store MISSING_FIELD_ID.

+++ b/core/modules/field/lib/Drupal/field/FieldInfo.phpundefined
@@ -230,6 +218,11 @@ class FieldInfo {
+      $this->fieldsById[self::MISSING_FIELD_ID] = NULL;

OK, this allows us to do return $this->fieldsById[$this->fieldIdsByName[$field_name]] when we return directly from "static" cache above. Seems a bit hackish though.
I'd prefer handling the case where fieldIdsByName[$field_name] == MISSING_FIELD_ID explicitly when we return from static cache.
See attached interdiff.

+++ b/core/modules/field/lib/Drupal/field/FieldInfo.phpundefined
@@ -285,10 +281,13 @@ class FieldInfo {
   public function getBundleInfo($entity_type, $bundle) {
     // Read from the "static" cache.
-    if (isset($this->bundleInfo[$entity_type][$bundle])) {
+    if (isset($this->bundleInfo[$entity_type][$bundle]) || (isset($this->bundleInfo[$entity_type]) && array_key_exists($bundle, $this->bundleInfo[$entity_type]))) {
       return $this->bundleInfo[$entity_type][$bundle];
     }
 
+    // Make sure that the bundle info is initialized.
+    $this->bundleInfo[$entity_type][$bundle] = NULL;

Those changes don't seem needed since the code below always puts a non NULL / non-empty value in $this->bundleInfo[$entity_type][$bundle]

We currently don't check whether the [$entity_type, $bundle] pair is a valid, existing bundle.
I started doing so at some point, but it introduced some nasty race conditions & dependency on entity_get_info() caches, and I reverted that.
So I'd I'd go for not checking the validity of bundle. If some code asks for info about a non-existing node type, we cache (both statically and persistently) the fact that we don't have any fields on that node type. Broken code repeatedly asking for non existing bundles will therefore add more "empty" cache entries, but that's not a big deal, those cache entries won't bloat the cache entries that get loaded in memory for valid bundles.

yched’s picture

Status: Needs work » Needs review

Dunno why this switched to "need work".
I still need to get back at @sun's remarks in #71, but let's have the bot run meanwhile.

yched’s picture

(side note : I forgot to push the changes corresponding to patch #83 in the field-info-cache-1040790-berdir sandbox. Did that just now.)

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

I've tested the patch on D8, and before, there's one row with all the field info (field_info_fields) that's over 100KB. After patch, there is a row per entity type (field_info:bundle:node:article and the like), and each cached entry per-bundle is only about 8KB.

Tests passed on the previous patch (which applied cleanly on my local computer), I think this patch looks great (and WOW there is a lot less data being passed around on one page request in my particular instance, instead of having an 8MB field_info_fields row... now it's about 10-50KB per entity type), and docs are reasonable.

Memory savings ftw!

geerlingguy’s picture

Status: Reviewed & tested by the community » Needs work

Ah, still haven't incorporated docs suggestions from sun in #71, so back to needs work, but imo, only the docs need some work. Taking a stab at this now, but only have a little time, so might not be able to get too far.

geerlingguy’s picture

Didn't get time to work on this, and probably won't by next week :(

geerlingguy’s picture

Status: Needs work » Needs review
FileSize
4.79 KB
47.09 KB

Back to CNR: I've made a few spelling corrections and moved around/edit a couple bits of text for clarity. Everything seems clear to me. For D8, maybe it makes sense to remove some of the now 'wrapper-ish' functions like field_info_field_by_ids(), but that would have further implications beyond what this patch is trying to achieve.

The warnings about memory use are fairly clear to me (especially after rearranging a couple docblocks), but please highlight what specific bits might be vague or too verbose still.

Status: Needs review » Needs work
Issue tags: -Performance, -memory, -Needs backport to D7

The last submitted patch, field-info-cache-1040790-89.patch, failed testing.

yched’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +memory, +Needs backport to D7

#89: field-info-cache-1040790-89.patch queued for re-testing.

yched’s picture

#89 is green, the fail is #1779638: Unexplained test failure in LocaleFileImportStatus->testBulkImportUpdateExisting().

Thanks @geerrlinguy !
The comment changes are fine by me.

For D8, maybe it makes sense to remove some of the now 'wrapper-ish' functions like field_info_field_by_ids(), but that would have further implications beyond what this patch is trying to achieve.

Fully agreed, there might totally be some API refactors worth considering around here, but let's do that in followups. The API around getting field and instance definitions will probably be impacted by the "convert Field API to CMI" patch.
Also, the FieldInfo object will definitely be read from the DIC at some point, but we'll see that later. (#1708692: Bundle services aren't available in the request that enables the module is still a blocker for this anyway)

The goal here is to keep changes minimal for an easy D7 backport.

I've been willing to see whether a fix for #1400256: Tests for: $info['fields'], not always set? could be baked into this patch, but that can also happen in a followup and shouldn't prevent anyone from RTBCing this :-)

geerlingguy’s picture

Issue summary: View changes

Updated issue summary.

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

Since I just changed a few comments, didn't touch code, and since the testbot and my manual testing confirmed no issues with the patch in #83 (now #89), I'm marking RTBC.

I've also updated the Issue Summary with more details, and some things to consider as follow-ups.

yched’s picture

Status: Reviewed & tested by the community » Needs review

The current patch should fix #1400256: Tests for: $info['fields'], not always set?. The steps-to-reproduce for this bug mostly exist on the D7 side, though, so I'm working on the D7 backport patch (being tested right now at #1305362-38: Helper issue for #1040790).

yched’s picture

Status: Needs review » Reviewed & tested by the community

crosspost :-)

yched’s picture

Issue summary: View changes

Better formatting, added example, clarified RTBC status.

yched’s picture

Attached is a D7 backport. Passes tests : #1305362-47: Helper issue for #1040790

catch’s picture

Status: Reviewed & tested by the community » Needs work

Sorry folks I went to commit this but it no longer applies - hunk in field.info.inc, quick re-roll?

swentel’s picture

Status: Needs work » Needs review
FileSize
28.66 KB

Re-roll, hope it's ok, did it really quick, so wait for the bot :)

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-98.patch, failed testing.

swentel’s picture

Status: Needs work » Needs review
FileSize
47.2 KB

Meh forgot a file.

yched’s picture

#100 should be correct, thks @swentel !

yched’s picture

Status: Needs review » Needs work

Ah wait, the patch was broken by #1757504: Regression: language field is not visible on manage display, made changes in _field_info_prepare_extra_fields(). We need to report the changes to the new location of the function. Working on it.

yched’s picture

OK, here it is.

yched’s picture

Status: Needs work » Needs review
bojanz’s picture

Status: Reviewed & tested by the community » Needs review

Tried to test the D7 backport on a test instance of Commerce Kickstart, got:

Fatal error: Call to undefined function _field_info_collate_fields() in modules/contrib/entity/entity.module 

.
(both on frontpage and on update.php). So, contrib will need some updates I guess?

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-103.patch, failed testing.

yched’s picture

Status: Needs work » Needs review

@bojanz : aw, right, some contrib modules call _field_info_collate_fields() directly :-/. I guess we'll need to bake some more BC in the D7 version. Will try to look into that later today.

yched’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Status: Reviewed & tested by the community » Needs work

The last submitted patch, field-info-cache-1040790-103.patch, failed testing.

yched’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +memory, +Needs backport to D7

#103: field-info-cache-1040790-103.patch queued for re-testing.

@catch : it's the same unrelated fail - it's all over the queues these days :-)
Let's try to be lucky, but #103 is green, really.

yched’s picture

Status: Needs review » Reviewed & tested by the community

Green :-)

yched’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
635 bytes
47.29 KB

@bojanz: can you try with the attached D7 patch ?
(interdiff with #96 attached)

bojanz’s picture

Status: Reviewed & tested by the community » Needs review

This needs more testing.

Applying the patch to a Kickstart v1 (D7 + Entity API + Rules + Commerce) install worked for me, but applying it to Kickstart v2 (Kickstart v1 + many contribs such as message, title, fences, crumbs, etc) didn't (breached 128M memory limit).

Preliminary poking showed that the problem is with calling field_info_instances() from hook_field_extra_fields(), because that causes it to loop back to FieldInfo::getBundleInfo(), over and over. This is something that Title does, as well as Message, as well as Commerce (and yet I didn't trigger from Kickstart v1, but maybe that's because that site uses less memory, so the 128M limit is further away).

After commenting out the call to hook_field_extra_fields() from inside FieldInfo::getBundleInfo(), I got the site to run.
Initial Kickstart v2 install without the patch: 1.5M.
With the patch: 0.25M.
(I used memory_get_usage(TRUE), which might be a bit too crude)

yched’s picture

@bojanz: what about this one ? (also includes @David Rothsein's test from #1400256: Tests for: $info['fields'], not always set?)
(interdiff with #113 attached)

yched’s picture

Sorry, rather this one.
(interdiff with #113)

bojanz’s picture

(Still testing Kickstart v2, as a heavy enough environment to represent a typical site)

This one works. However...
"drush cc all" now takes significantly longer (about a minute and a half I'd say).
The first page load after the "drush cc all" exceeds the 30s load time and crashes.
(This happens for every "drush cc all", not just the initial one after applying the patch)
After that it works, and I see a consistent decrease in memory usage (from 0.85M to 1.25M per page as reported by devel).
I'm also seeing a consistent increase in page load times, between 200ms and 500ms per page.
So it seems there's still something fishy going on.

I can xhprof this on sunday if nobody beats me to it.

Berdir’s picture

Tested this on 7.x as well, seeing similar results. looking at with xhprof shows 18 calls to FieldInfo::getFields() through field_info_fields(), which is called by commerce_info_fields() (commerce_product_reference_entity_view()) and _nodeconnect_get_ref_fields() in mycase.

I'm quite sure that commerce can be optimized to get rid of these calls, especially because we're not actually using commerce_product_reference on this site at all. Don't have time to look into that right now, @bojanz, can you have a look at that? Another performance issue related to that function: #1567090: Unconditional call to entity_metadata_wrapper() in commerce_product_reference_entity_view().

swentel’s picture

Did a test with the D7 patch as well on a current site:

- 138 fields
- 305 instances

yes it's a complex site :)

Before patch:
Page execution time was 112.85 ms. Memory used at: devel_boot()=1.82 MB, devel_shutdown()=16.98 MB, PHP peak=17.25 MB.
After patch:
Page execution time was 84.28 ms. Memory used at: devel_boot()=1.82 MB, devel_shutdown()=11.82 MB, PHP peak=12.25 MB.

Also as bojanz reported, drush cc all takes a long time, first call to the page didn't go white though.
Memory drop is fantastic, and didn't see an increase on page loads so far.

I'm going to run this one in pre-production for a few days because I only briefly browsed around on the site so far. Client is giving in a lot of content and browsing a lot, so I'll hear reports from them soon enough once something might go wrong.

geerlingguy’s picture

Status: Needs review » Needs work

(Firstly, sorry for the brain-dump of a comment to follow… I’ve been debugging for an hour or so, and can’t piece together everything as succinctly as I’d like.)

After applying the patch to a local D7 site, I kept getting out of memory errors (with 256MB RAM allocated in memory_limit), in different field files (field.crud.inc, field.info.inc, field_sql_storage.module, etc.).

I was using the APC module and caching cache_field and some other caches in APC, but this was causing out of memory errors even when I had PHP’s memory limit set to 1, 2, 3 or 4 GB (yikes!). So, I turned off APC caching, and instead used database caching (which helped the httpd process stay below 200MB for the most part). But now the load switched from memory to CPU, and the max time limit kept getting hit (I tried letting it run for 15 minutes at one point).

When I could get some debug output, after 15 minutes of a process running, I found that there were tons and tons of calls to field_read_instances(), each with slow-running queries to {field_config_instance} and {field_config} that took about 43ms (looks like these calls were coming from getFieldMap()). One such query is:

SELECT fci.* FROM field_config_instance fci INNER JOIN field_config fc ON fc.id = fci.field_id WHERE (fc.active = :db_condition_placeholder_0) AND (fc.storage_active = :db_condition_placeholder_1) AND (fc.deleted = :db_condition_placeholder_2) AND (fci.deleted = :db_condition_placeholder_3)

The below query was also called over 450 times, and I don't think it was cached:

SELECT field_name, type FROM field_config WHERE active = 1 AND storage_active = 1 AND deleted = 0

The entire time, CPU usage spiked at 100% and stayed there.

So… for a site with 550 fields, 5,049 field instances, 12 node types, and 479 profile types (so 491 entity bundles), it looks like this patch isn’t very helpful when trying to build caches…

swentel’s picture

Yes, I was to hopefull but had to revert as well. Clearing cache, but also simply adding a new field, was a real pain. In our case, cache was handled by memcache which sometimes got completely crazy. I'm going to dig in this a bit during the weekend.

geerlingguy’s picture

If I statically cache the results of the second query in #120, I still run out of execution time after 470 calls to the first query and 800 seconds of time.

yched’s picture

Version: 8.x-dev » 7.x-dev

Ok, real-life use cases are on D7 right now, so I think we'd better switch this to D7.
When we get a correct D7 patch, we can come back to the usual "fix in D8 then backport" way.

yched’s picture

Regarding the slowness reported in #120:

I'd be surprised if getFieldMap() was a problem. It's statically and persistently cached, and even in contexts where cache gets cleared 10 times a request, I don't see how the request could get painfuly slow, even with 5000 records in field_config_instance

What I think is the issue, though, is that the patch changes field_info_fields() and field_info_instances() (the ones that return *all* fields and *all* instances) to not be statically nor persistently cached. As per the early posts in this discussion (around #20 / #30), it was decided to stop caching those.
While I think that's valid for D8, it's not a behavior we can change for D7. While working on this patch earlier this spring, I grepped a couple "big" contrib modules, and there's an *awful* amount of code out there that calls field_info_instances(), even in places where it could do differently - just because they can : field_info_[fields|instances()] currently cost nothing, everything is in memory anyway.
I filed a couple patches against a couple of those modules back then, but it's a lost battle.

- In D8, there will be smarter ways to navigate through the list of fields and instances to find a specific one (the fieldMap is one).
- In D7, we need to keep field_info_fields() and field_info_instances() fully cached - just not loaded in memory by default on every page, only when explicitly requested.
Then, a call to one of those will have a cost (load or rebuild a huge array, have it sit in memory), but a series of calls - which is what I think is happening on the cache clears mentioned in the comments above) will mean no extra cost.
Contrib modules calling those in regular page views will need to be fixed individually.

I see no other way around for D7. We can fix field_info_instances($entity_type, $bundle), we cannot fix calls to the "gimme all" versions, only try to progressively move away from them.

Current D7 patch, being a straight backport of the D8 patch, still has field_info_fields() and field_info_instances() uncached, which is why IMO we have those issues on cache clears.

geerlingguy’s picture

I see no other way around for D7. We can fix field_info_instances($entity_type, $bundle), we cannot fix calls to the "gimme all" versions, only try to progressively move away from them.

Current D7 patch, being a straight backport of the D8 patch, still has field_info_fields() and field_info_instances() uncached, which is why IMO we have those issues on cache clears.

That sounds workable. The main offender, in my case, is Profile2, and since it's hopefully a candidate for core inclusion, it's already being cleaned up quite a bit... And this kind of conversion would probably be considered as part of the inclusion cleanup. (See #1668292: Move simplified Profile module into core).

yched’s picture

Status: Needs work » Needs review
FileSize
50.53 KB

Attached patch adds back static caching for field_info_fields() and field_info_instances().

Tested on Commerce Kickstart 2, full setup with demo content :
- without patch, 'drush cc all' takes 30 seconds
- with patch, 'drush cc all' takes 32 seconds

Also :
I actually installed kickstart 2.0-beta2 a couple days ago.
Back then, the patch #116 would die on an infinite recursive loop, because commerce_product_reference_field_extra_fields() was calling field_info_instance().
I needed to split collecting extra fields into a separate function and cache entry to be able to get a cache clear to run.

Since bojanz was able to test the patch in #117, I assume the code has changed in kickstart 2.0-rc1.
It seems wise, however, to keep this separation. I don't know what else we have in contrib. That means an extra cache hit per bundle rendered in the page.

So, attached patch:
- adds back static caching on getInstances() and getFields()
The static lists are shared with getField($field_name) and getBundleInstances($entity_type, $bundle), so that we don't bloat memory with twice the same fields and instances.
Still @todo : add persistent caching ? Not sure, given the numbers above
- moves getBundleExtraFields() to a separate method

yched’s picture

Oops, missed a rename.
Attached interdiff with #116.

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-126-D7.patch, failed testing.

yched’s picture

Sharing the static lists raises an issue because they keep track of non-existent fields and bundles :

field_info_instances('non_existsing_entity_type') -> array()
field_info_instances('non_existsing_entity_type', 'non_existing_bundle') -> array()
field_info_instances('non_existsing_entity_type') -> array('non_existing_bundle' => array())

field_info_field_by_ids() -> array(1 => $field_1, 2 => $field_2)
field_info_field_by_id(99) -> NULL
field_info_field_by_ids() -> array(1 => $field_1, 2 => $field_2, 99 => NULL)

Hence the test fails above.

Not sure about the right fix.

pounard’s picture

I see two different fixes (probably more are possible):
1) Keep two statics, one with real data, and the other one which only says 'exists' or 'does not exist' (TRUE or FALSE for example), when requesting a single field, attempt lookup in the second, and if exists then pick the value from the first. When requesting the complete list just return the first.
2) Proceed to array_filter() calls when returning it complete.
EDIT: While 2) is simpler, it's also less performance wise (it creates a whole lot of new arrays with array_filter()), so I think 1) is a good path even if a bit more complex.

yched’s picture

@pounard :
Right, I considered array_filter() too, but we don't want to create copies of the massive arrays.
Attached patch goes for something like your approach 1).

It passes the 'field info' tests, let's see about the rest.

Patch also updates phpdocs, since field_info_fields() and field_info_instances() do statically cache their stuff now.

yched’s picture

Status: Needs work » Needs review
pounard’s picture

It looks fine, but I don't know the whole patch so don't consider my judgement is reliable. It needs to be manually tested and measured I guess.

geerlingguy’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs review » Reviewed & tested by the community

Awesome results! This patch halved the page load times for most pages on my test site. The site has 541 fields, 4,629 field instances, 12 node types, and 490 profile types. I've attached a file with results from three page loads, with a cache clear and a normal page load (cached), on two different pages on the test site. One page was complex, with a couple views, a few entities loaded, etc. Another page was a simple node page.

Here's a quick summary of the results (before and after the patch):

Before Patch

  • Complex page, cache clear all: 11,324ms, 186MB peak memory use
  • Complex page, normal/cached load: 1,473ms, 93.25MB peak memory use
  • Simple page, cache clear all: 11,281ms, 185.35MB peak memory use
  • Simple page, normal/cached load: 1,221ms, 88MB peak memory use

After Patch

  • Complex page, cache clear all: 30,680ms, 174.5MB peak memory use
  • Complex page, normal/cached load: 668ms, 34.75MB peak memory use
  • Simple page, cache clear all: 31,991ms, 175.35MB peak memory use
  • Simple page, normal/cached load: 463ms, 32MB peak memory use

Differences

  • Complex page, cache clear all: +19,356ms, -11.5MB peak memory use
  • Complex page, normal/cached load: -805ms, -58.5MB peak memory use
  • Simple page, cache clear all: +20,710ms, -10MB peak memory use
  • Simple page, normal/cached load: -758ms, -56MB peak memory use

After the caches are built, page load times and memory usage more than halve. This is better than I had expected! For smaller sites, this will only mean a couple MB savings at most... and cache clears won't take nearly as long as my tests (I don't know of any other site that has as many entities, but I imagine Commerce and Profile2-heavy sites may come close).

Anyways, I'd gladly take an extra bit of time during a cache clear (which only happens every few days on this particular site) for 2x faster page loads and half the memory requirements!

[Edit: One more note: if you look at the detailed results file in the next comment, you'll notice I have the sizes of the cache_field table before and after the patch listed as well. Before the patch, it's about 6 MB. After, it's over 100 MB. I wouldn't be able to cache table in APC after the patch (limited RAM), but that's fine with me since the entire 6 MB field doesn't need to be loaded on every request. Just something to note.]

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
769 bytes

Whoops! I forgot to attach the detailed results file...

webchick’s picture

Version: 7.x-dev » 8.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Cool! Let's get some results from a few more people. Also moving to 8.x and to be ported.

Lars Toomre’s picture

FYI... When next re-rolled, please check for white spaces at the end of lines. There were two instances in the patch in #131.

geerlingguy’s picture

Yeah, noticed that, but forgot to mention.

geerlingguy’s picture

Attached patch for D7 fixes whitespace errors.

yched’s picture

@geerlingguy : thanks for the tests (and reroll) ! It would be good to hear from Commerce Guys guys (sorry, lame, couldn't resist :-p), but your numbers are very encouraging.

Regarding the size of the cached entries : what the patch does is store fields + instances involved in a given bundle in a dedicated cache entry for the bundle. Thus, $field definitions get duplicated in the cache entries for the bundles it appears in.

Going from 6Mb to 100Mb seems a lot, though. Beware that this cache table also contains the field values cache.
Would be interesting to check how many "field_info:bundle:[entity_type]:[bundle]" entries you have, and how many fields are stored in each of them, see if that matches your sites field map.

Also - we still need to add back persistent caching of "all fields" and "all instances" IMO. I'm hoping to be able to roll a patch later tonight

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Patch (to be ported) » Needs work

Oops. Guess this is still NW then. :(

bojanz’s picture

Sorry, traveling at the moment, will try to test the latest patch as soon as I stop (in about 2 days).
Great work so far!

yched’s picture

Attached patch:
- adds back persistent caching for field_info_fields() and field_info_instances()
- exposes the "lightweight" field map (now includes field type) through a new field_info_field_map() function, which should cover most current use cases for field_info_fields() / field_info_instances(), so that contrib can start getting rid of those expensive calls.
- updates phpdocs for the new caching behavior (we do cache full lists)

yched’s picture

Status: Needs work » Needs review
FileSize
700 bytes
55.15 KB

Fixes phpdoc for Fieldinfo::getExtraFields(), was blindly copy pasted from getBundleInstances().

yched’s picture

Version: 7.x-dev » 8.x-dev

And the corresponding D8 patch.

I'm going afk til the end of October. I'm leaving it to you folks to drive this home :-)

yched’s picture

Er, D8 patch attached.

yched’s picture

Issue summary: View changes

Adjusted follow-up task description.

yched’s picture

Updated the "API changes" section in the issue summary, to match the current patch.

yched’s picture

Issue summary: View changes

Updated for latest changes

Status: Needs review » Needs work
Issue tags: -Performance, -memory, -Needs backport to D7

The last submitted patch, field-info-cache-1040790-145-D8.patch, failed testing.

yched’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-145-D8.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +memory, +Needs backport to D7
pounard’s picture

I applied the latest D7 patch version and I get exceptions thrown by search api, when rebuilding views data, when flushing the caches. I'm not giving any details right now but I'm not sure of the source, but this wasn't happening before applying the patch. Ignore that this doesn't seem related, just a weird coincidence.

catch’s picture

#134 is very, very encouraging. I think it's fine to trade the extra execution time on a cache miss for better runtime performance, especially considering peak memory usage is a lot lower as well.

joachim’s picture

Version: 7.x-dev » 8.x-dev
Status: Reviewed & tested by the community » Needs work

I've applied this patch to my dev site that has commerce and a ton of fields on lots of bundles, and it certainly seems to make page loads quicker. No error messages on cache clear.

+++ b/modules/field/field.info.inc
@@ -34,7 +57,25 @@ function field_info_cache_clear() {
+function _field_info_collate_fields($reset) {

@@ -162,284 +203,6 @@ function _field_info_collate_types($reset = FALSE) {
-function _field_info_collate_fields($reset = FALSE) {

This function has changed its signature.

Does this patch needs a change request, or is it ok because it's a private function?

geerlingguy’s picture

Status: Needs work » Needs review

Using D7 patch in #144, clearing caches takes about 5 seconds longer, but normal page load times and memory usage were the same. It's crazy how much faster the site feels with this patch. Most pages load in half the time they used to!

Hoping to hear more of the same from some commerce user... If all that's needed is a quick test of before/after with a default commerce install, I'd be glad to do that, but it seems like someone who has an actual site with real data to test would probably be preferable.

pounard’s picture

When I tested it over a complex commerce site with 100+ fields here, I didn't see any difference in page generation time at all (tested on pages displaying 30 products and 30 associated nodes). Important note thought: the windows box I'm working on spends half the time on drupal_load() and PDO::execute() alone. Aside of that, drupal_alter() take more time than every other field API operation (3000+ drupal_alter() calls dues to the field API alone). I didn't notice any difference when clearing the caches. After all that I got unrelated troubles with Search API.

This is an objective feedback, aside of that, all other feedbacks are really encouraging.

swentel’s picture

Did two testing scenarios (with the D7 patch), created a setup with a drush script (works for D7 and D8) to quickly create content types and generate a number of fields (currently only textfields though) - see http://drupal.org/sandbox/swentel/1781486 - just run 'drush fact anynumber' (defaults to 10)

Scenario 1
7 content types, 286 fields and instances, 100 nodes - 1 node type has 236 fields, others have between 10 and 26.

Request on frontpage - nodes from various types generated by devel generate (100)
Before: Page execution time 130.94 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=11.14 MB, PHP peak=11.5 MB.
After: Page execution time 134.38 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=10.38 MB, PHP peak=10.75 MB.

Request on node with 236 fields
Before: Page execution time 131.76 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=11.27 MB, PHP peak=12.75 MB.
After: Page execution time 127.95 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=10.28 MB, PHP peak=11.75 MB.

Request on node with 16 fields
Before: Page execution time 62.61 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=10.06 MB, PHP peak=10.25 MB.
After: Page execution time 51.03 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=4.98 MB, PHP peak=5.25 MB.

Request on node with 4 fields (standard core article content type)
Before: Page execution time 57.77 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=10.02 MB, PHP peak=10.25 MB.
After: Page execution time 48.71 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=4.74 MB, PHP peak=5 MB.

Scenario 2

10 content types, 50 fields each (so yeah total of 500), 1000 nodes

Request on frontpage - nodes from various types generated by devel generate
Before: Page execution time 113.48 ms. Memory used at: devel_boot()=0.98 MB, devel_shutdown()=14.95 MB, PHP peak=15.25 MB.
After: Page execution time 107.48 ms. Memory used at: devel_boot()=0.98 MB, devel_shutdown()=11.12 MB, PHP peak=11.5 MB.
On the frontpage, only 7 different node types where rendered.

Request on node with 50 fields
Before: Page execution time 79.62 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=14.07 MB, PHP peak=14.75 MB.
After: Page execution time 62.09 ms. Memory used at: devel_boot()=0.99 MB, devel_shutdown()=5.62 MB, PHP peak=6.25 MB.

geerlingguy’s picture

Status: Needs review » Needs work

Whoops—we need to get an answer for #154:

This function has changed its signature.
Does this patch needs a change request, or is it ok because it's a private function?

Other than that, it seems like this can go back to RTBC...

catch’s picture

#154 just needs review for me, and doesn't matter for 8.x commit.

geerlingguy’s picture

Status: Needs work » Reviewed & tested by the community

#146 seems fine to me; works for D8 in manual testing (creating fields, deleting fields, viewing nodes, etc.). I didn't do any performance benchmarks on it, though, since I only had a couple bundles and a few fields. Anyone else able to test and confirm RTBC for D8?

Then we could set this back to CNW to resolve the question mentioned in #154 for D7, after a D8 commit.

catch’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Performance, +memory, +Needs backport to D7

The last submitted patch, field-info-cache-1040790-145-D8.patch, failed testing.

swentel’s picture

Status: Needs work » Needs review
FileSize
52.41 KB

Reroll of D8 patch

Status: Needs review » Needs work

The last submitted patch, 1040790-163-D8.patch, failed testing.

swentel’s picture

Ok, now that's a nice surprise, I'll recheck this evening.

swentel’s picture

Status: Needs work » Needs review
FileSize
60.73 KB

Ok, needed some re-rolling after #1326634: Clean up API docs for field module and #1785256: Widgets as Plugins got in - c'mon bot!

- edit - also removed the t() calls in fieldInfoTest.php, but mistakenly thought that was a new file, will need a new re-roll tomorrow. see #1797106: Remove t() from asserts messages in tests for the Field module for more background.

Status: Needs review » Needs work

The last submitted patch, 1040790-166.patch, failed testing.

swentel’s picture

Status: Needs work » Needs review
FileSize
61.17 KB

Ok, let's see what this does.

Status: Needs review » Needs work

The last submitted patch, 1040790-168.patch, failed testing.

xjm’s picture

Issue tags: +Avoid commit conflicts
xjm’s picture

Issue tags: -Avoid commit conflicts

Spoke to @swentel and re-un-tagging based on the current status here since there are functional parts of the _ficf() patch that are broken after widgets-as-plugins. Sorry for the noise!

swentel’s picture

Status: Needs work » Needs review
FileSize
2.83 KB
51.54 KB

Alright here we go - thanks to Stalski as well for the debugging.

swentel’s picture

#172: 1040790-170.patch queued for re-testing.

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me. Here are some quick performance comparisons, from an upgraded site with about 25 fields shared across 8 node types (bundles) - 3 runs each for comparison:

Before Patch

  • Cache clear: 636.37ms, 25.75MB RAM
  • Complex Page load: 438ms, 23.25MB RAM
  • Simple Page load: 483, 23MB RAM

After Patch

  • Cache clear: 640ms, 25.5MB RAM
  • Complex Page load: 555ms, 23MB RAM
  • Simple Page load: 470ms, 22.75MB RAM

The page load time for the complex page is a little odd, don't know why it took longer after patch, but RAM savings are apparent, even though slight on a smaller site. I think swentel said he was going to do a little testing on a larger site instance too.

swentel’s picture

Did a test like the last one in #157 for the D8 patch as well:

10 content types, 50 fields each (so yeah total of 500), 1000 nodes (devel generate)

Request on frontpage - nodes from 7 different node types
Before: Page execution time was 184.68 ms. Memory used at: devel_boot()=1.18 MB, devel_shutdown()=15.47 MB, PHP peak=15.75 MB.
After: Page execution time was 178.27 ms. Memory used at: devel_boot()=1.19 MB, devel_shutdown()=12.5 MB, PHP peak=12.75 MB.

Request on a single node page with 50 fields
Before: Page execution time was 95.89 ms. Memory used at: devel_boot()=1.18 MB, devel_shutdown()=14.41 MB, PHP peak=14.75 MB.
After: Page execution time was 72.91 ms. Memory used at: devel_boot()=1.2 MB, devel_shutdown()=5.72 MB, PHP peak=6 MB.

swentel’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Avoid commit conflicts
FileSize
51.54 KB
Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Still looks good. Let's get this in, it's about time :)

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Really good to see this RTBC again. Committed/pushed to 8.x!

geerlingguy’s picture

Status: Patch (to be ported) » Needs review

#144: field-info-cache-1040790-144-D7.patch queued for re-testing.

I think that the D7 patch should be RTBC if tests pass, but there was a lingering question in #154: "This function [_field_info_collate_fields()] has changed its signature. Does this patch needs a change request, or is it ok because it's a private function?"

pounard’s picture

I'm thinking that a lot of modules may use this function. Even if the underscore denotes a private function, I guess that a lot of modules will be tempted to use it.

tim.plunkett’s picture

#180, #1288364: Clean up reset patterns in field.info.inc already changed the signature, and there was no change notice. It's a private function, if they used it that's their problem.

I grep'd through major contrib modules, and all I could see was one call in entity.module that would need to be changed to _field_info_field_cache()->flush();

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

if they used it that's their problem

I agree wholeheartedly. Now that I think about it... everywhere private functions are mentioned, there are heavy warnings to NOT use them in external code, and to do so at your own risk (see two examples).

Therefore, with a passing test in #144, I'm RTBCing.

yched’s picture

I *really* can't see any use case for 3rd party code to use _field_info_collate_fields() (no param) directly, we have wrapper API funcs for that. Any module doing so could be rightfully considered broken, but I seriously doubt there is any.

There are direct calls to _field_info_collate_fields(TRUE) (reset the cache), though. Entity API is one example (see #105/#107 above).
Entity API is not *supposed* to do that, but I guess it has kind of valid reasons - possibly because the official field_info_cache_clear() API func also clears the "list of available field types, widgets, formatters" cache, and Entity API, doing funky cross dependent stuff with various info caches, needed a more granular clear). So it seems a valid expectation to keep those calls working, and we can do so with minimal efforts.

IMO, this doesn't deserve a mention in the release notes, but I guess that's up to @webchick and @David_Rothstein to decide.

PS : yay for D8 commit, thanks @swentel & @Stalski for the rerolls !

webchick’s picture

Assigned: Unassigned » David_Rothstein

This one has serious potential to break things on 7.x, so assigning to David to eyeball first.

alippai’s picture

guy_schneerson’s picture

Related issue #1810178 _field_info_collate_fields() is not localized
This requires language specific versions of the cache variables. It should not affect memory usage as long as global $language; dose not change through the lifetime of a single request.
Happy to close the related issue if this patch can address the issue or look at it again after this is committed.

David_Rothstein’s picture

Assigned: David_Rothstein » Unassigned
Status: Reviewed & tested by the community » Needs work

Thanks for all the work on this patch, as well as the extensive Drupal 8 reviews and Drupal 7 testing!

I'm a bit concerned that despite the fact that this has been filed against Drupal 7 for the last month there haven't really been any in-depth reviews of the Drupal 7 code itself. (Especially because from looking at the first 10 lines of the patch it's pretty clear that there is Drupal 8-specific stuff that carried over into it.)

Anyway, I spent a fair amount of time giving this a review myself. Overall it looks like a great improvement - while it's pretty clear this will hurt performance a bit in some cases, it seems like the huge memory gains (and performance improvements in other cases) do make it worth getting this into Drupal 7.

I separated my feedback below into a few categories based on severity. (I also tried to note places where my comments applied to the Drupal 8 code as well.) And I'd really feel more comfortable if other people gave this a mega-review as well. This is one of the bigger and more far-reaching patches we're ever going to commit to Drupal 7...

Commit-blockers and commit-blocking questions

  1. +  if (!isset($info)) {
    +    include_once(__DIR__ . '/FieldInfo.php');
    +    $info = new FieldInfo();
    +  }
    

    __DIR__ does not exist in PHP 5.2, so we can't use that. Also, huh? I'm pretty sure that whole include_once line can just be removed :)

  2.      // Verify that no unexpected instances exist.
         $instances = field_info_instances('test_entity');
    -    $expected = array('test_bundle' => array());
    +    $expected = array();
    ...
         // Test with an entity type that has no bundles.
         $instances = field_info_instances('user');
    -    $expected = array('user' => array());
    +    $expected = array();
    

    This throws up some red flags for me, since field_info_instances() is a public API function... Why is the expected output of field_info_instances() changing?

  3. I *really* can't see any use case for 3rd party code to use _field_info_collate_fields() (no param) directly, we have wrapper API funcs for that. Any module doing so could be rightfully considered broken, but I seriously doubt there is any.

    I looked through the codebase of a Drupal 7 site I had and found one right away:
    http://drupalcode.org/project/field_delimiter.git/blob/abca25a20b5e05b06...

    (Though granted it's an alpha release and the code is actually already gone in the latest dev version.)

    The Internet found me one more:
    http://drupalcontrib.org/api/drupal/contributions!cck!cck.module/function/cck_debug_field_info/7

    (Although yes, that's dev code also.)

    Anyway, this is not a public API function, but it's clear people are using it on occasion and especially for something as fundamental as Field API we try not change it in a stable release unless we have to (#181 points out that this function was changed in #1288364: Clean up reset patterns in field.info.inc but that looks like a Drupal 8 issue only).

    So my question is simply this: Would it really be hard to keep around? Wouldn't it mostly just be a wrapper around class methods that the patch is already adding?

  4. Similarly, the issue summary above says the following about the _field_info_prepare_*() helper functions:

    There might be valid edge cases for calling them from 3rd party, so the methods have been kept public, and the D7 patch can provide BC versions.

    Currently the D7 patch removes them outright, though. So, same question as above - is this necessary? I have no idea if people are using them or not, but again, it seems like it would be very very simple to just leave these around as simple wrappers on the new class methods.

Would be really, really good to resolve before commit

  1. + * Definition of Drupal\field\FieldInfo.
    

    (and tons of other places). These code comments are referring to the Drupal 8 class name, not the Drupal 7 one.

  2. --- /dev/null
    +++ b/modules/field/FieldInfo.php
    

    I'm not sure if we have an official standard for this, but I really haven't seen Drupal 7 (as opposed to Drupal 8) use this kind of file naming convention before. Shouldn't it be something like field.info.class.inc instead?

  3. @@ -337,8 +341,18 @@ function field_read_fields($params = array(), $include_additional = array()) {
     
       // Turn the conditions into a query.
       foreach ($params as $key => $value) {
    +    // Allow filtering on the 'entity_type' and 'bundle' columns of the
    +    // field_config_instance table.
    +    if ($key == 'entity_type' || $key == 'bundle') {
    +      if (empty($fci_join)) {
    +        $fci_join = $query->join('field_config_instance', 'fci', 'fc.id = fci.field_id');
    +      }
    +      $key = 'fci.' . $key;
    +    }
    +
         $query->condition($key, $value);
    

    This will cause fatal "ambiguous column name" errors in some cases, since a lot of columns have duplicate names between the two tables but the regular keys aren't being prefixed with "fc". (For example, try something like this: field_read_fields(array('entity_type' => 'node', 'field_name' => 'field_image')).)

    The only reason this isn't a strict commit blocker is that I suppose it can only occur if you're using the new functionality, but it would still be a really good idea to fix.

    Affects Drupal 8 also.

Other issues

  1. +  /**
    +   * Tests that the field info cache can be built correctly.
    +   */
    +  function testFieldInfoCache() {
    

    This looks like the tests from #1400256: Tests for: $info['fields'], not always set? but I don't see it in the Drupal 8 patch. Shouldn't it be forward-ported there (or is that what the other issue is being left open for?)

  2. + * Use of this function to retriev instances across separate bundles (i.e. when
    

    Typo ("retriev" => "retrieve").

    Affects Drupal 8 also.

  3. + * the request by the field_info_cache() function
    

    This function doesn't exist.

    Affects Drupal 8 also.

  4. +   * Retrieves all active, non-deleted instances definitions.
    +   *
    +   * This method does not read from nor populate the "static" and persistent
    +   * caches.
    ....
    +  public function getInstances($entity_type = NULL) {
    

    That code comment isn't true, is it?

    Affects Drupal 8 also.

  5. +   * Separately tracks requested field names or IDs that do not exists.
    

    Typo ("exists" => "exist").

    Affects Drupal 8 also.

  6. @@ -1717,6 +1717,11 @@ function hook_field_storage_details_alter(&$details, $field) {
      * objects in $entities. Fields with no values should be added as empty
      * arrays.
      *
    + * By the time this hook runs, the relevant field definitions have been
    + * populated and cached in FieldInfo, so calling field_info_field_by_id() on
    + * each field individually is more efficient than loading all fields in memory
    + * upfront with field_info_field_by_ids() (which is uncached).
    

    I'd probably move this to an inline comment (rather than part of the PHPDoc), no? It doesn't make much sense out of context.

    Affects Drupal 8 also.

  7. + * When retrieving the instances of a specific bundle (i.e. when both
    + * $entity_type and $bundle_name are provided, the function also populates a
    

    Missing parentheses after "provided".

    Affects Drupal 8 also.

jhodgdon’s picture

This patch has also had an "avoid commit conflicts" tag for months. Can that be removed?

geerlingguy’s picture

Issue tags: -Avoid commit conflicts

Yeah, I'm pretty sure that was really just for D8 HEAD tracking. We're past that point, and we just have some follow-up things to address.

jhodgdon’s picture

THANK YOU. I was getting tired of looking for the patch in #144 every time I needed to make a commit. :) [unsubscribing]

yched’s picture

@David_Rothstein:
Thanks for the thorough review. I'll try to update the D7 patch asap.

Meanwhile, I posted patches for the items you mention that apply to D8:
#1837018: Possible "ambiguous column name" error in field_read_fields - nice catch !
#1837022: Field API comment fixes

mgifford’s picture

Any update? Is the patch in #176 safe to apply to a production environment yet?

We're running into some nasty looping that we're hoping to resolve.

mgifford’s picture

Status: Needs work » Needs review

#176: 1040790-176.patch queued for re-testing.

geerlingguy’s picture

#144 is the lastest D7 version of the patch, and it's working very well on one production site, keeping page loads down to 10-20 MB (were 160 MB or more). Cache clears cause a long rebuild, though.

yched’s picture

So, yeah, sorry for not reporting earlier.
David told me he is not planning a new D7 "regular" (non-security) release before January, so I'm focusing on D8 work until feature freeze, and will come back at this after Dec 1st.

yched’s picture

Getting back at this.

Attached patch fixes most of @David_Rothstein's remarks in #187.
I'll talk about the remaining points in a followup comment (probably tomorrow, though).

- Removed the implicit include before using the FieldInfo class in _field_info_field_cache()
Not having it caused fatal errors in tearDown() on all upgrade tests.
This gets fixed in UpgradePathTestCase : its setUp() unregisters the code registry, so its tearDown() needs to re-register it before running regular cleanup code.
Also, added a dummy update function to force a registry rebuild.

- Added back BC functions for the _field_info_prepare_*() helper functions.

- Moved the class file from modules/field/FieldInfo.php to modules/field/field.info.class.inc.
Got rid of the other D8-isms.

- Fixed the possible "ambiguous column name" errors in field_read_fields(), and added tests (D8 fix got in meanwhile : #1837018: Possible "ambiguous column name" error in field_read_fields)

- Fixed the doc remarks (D8 fix got in meanwhile : #1837022: Field API comment fixes)

yched’s picture

Spotted after I uploaded the patch :
- one extra newline before FieldCrudTestCase::testReadFields()
- assert messages in newly added tests were not wrapped in t()

Fixed locally, will be in the next reroll.

Status: Needs review » Needs work

The last submitted patch, field-info-cache-1040790-196-D7.patch, failed testing.

yched’s picture

Status: Needs work » Needs review
FileSize
59.18 KB

So, turns out that the fixes in UpgradePathTestCase() in #196 still leave *some* upgrade tests failing with a fatal if we solely rely on the class registry to do the autoloading. I tried to investigate a bit, but I can't make sense of this for now.

Thus, added back an explicit include for the class file (with a @todo) before instanciating the FieldInfo object - and reverted the UpgradePathTestCase changes.

yched’s picture

So, the remaining points in @David_Rothstein's #187 were:

- field_info_instances() output slightly changing :

Previously, _field_info_collate_types() initialized the 'instances' array (the one returned by field_info_instances()) to an empty array for all existing entity types and bundles. So you could do something like:

$instances = field_info_instances();
foreach ($instances[$whatever_entity_type][$whatever_bundle] as $instance) {
  // ...
}

with no PHP notice, without checking that the entries existed.

We stopped doing that in the D8 version, because this means calling entity_get_info() when rebuilding field_info and causes nasty cache rebuild dependencies.
Also, the above code pattern is actually not the typical use of field_info_instances() - if you want the instances of a specific entity_type and bundle, you'll use field_info_instances($entity_type, $bundle), not field_info_instances(give me all) - when using the latter, you typically iterate on the entries in the resulting array.

But, well, sure, I guess we cannot really change that in D7.
Attached patch reintroduces the the previous behavior (based on entity_get_info()) in FieldInfo::getInstances() - the regular getBundleInstances($entity_type, $bundle) does not rely on entity_get_info().
Patch also reverts the corresponding test changes.

- Should we keep an emulated _field_info_collate_fields() around ?

Well, I'm still convinced that any code calling this function directly is buggy and at fault.
If we can't remove that one, then it means our 'no API breaks" policy extends to all existing functions, and that there is no such thing as an "underscore prefix" ?
But well, I'm not in the shoes of a major version maintainer, and I fully respect your concerns :-).

Attached patch emulates the previous behavior of _field_info_collate_fields() - it's going to be a performance drag though...

Additionally:
- fixed a minor glitch in FieldInfo::getField() (D8 got fixed in #1831202: wrong param for field_read_field(), forgot to report it in the backport)
- found & fixed a minor bug in FieldInfo::getBundleInstances(), incorrectly caches bundles with no instances, the info is never statically cached, gets re-read from persistent cache each time. Since this deals with internal caching in protected members, this is not really testable. Will need to get fixed in D8 as well, I'll open a separate issue.

And well, this, my friends, is comment #200 :-/

yched’s picture

From #200 above:

- found & fixed a minor bug in FieldInfo::getBundleInstances(), incorrectly caches bundles with no instances, the info is never statically cached, gets re-read from persistent cache each time. Since this deals with internal caching in protected members, this is not really testable. Will need to get fixed in D8 as well, I'll open a separate issue.

Posted D8 patch at #1864720: Minor hysteresis in field_info_instances(). While doing so, I found out that the bug is actually observable and testable.
Attached patch adds the same test than in the D8 patch (also slightly adjusts the comment around the bugfix).

valthebald’s picture

Status: Needs review » Reviewed & tested by the community

Works for D8. Testbot likes it. Checked locally. May I mark this RTBC?

YesCT’s picture

@valthebald did you mean D7?

valthebald’s picture

@YesCT: sure. I applied patch from #201 to the latest 7.x sandbox

obrienmd’s picture

Works for me for D7 install w/ lots of custom modules interacting w/ field API and entity API.

I see a 2-3% performance increase on front page w/ 10 nodes and 3-5% on node/%nid.

klausi’s picture

+++ b/modules/field/field.info.class.inc
@@ -0,0 +1,675 @@
+    $query = db_select('field_config_instance', 'fci');
+    $query->join('field_config', 'fc', 'fc.id = fci.field_id');
+    $query->fields('fc', array('type'));
+    $query->fields('fci', array('field_name', 'entity_type', 'bundle'))
+      ->condition('fc.active', 1)
+      ->condition('fc.storage_active', 1)
+      ->condition('fc.deleted', 0)
+      ->condition('fci.deleted', 0);

This is not a dynamic query, so it should use db_query() instead of db_select(). Minor performance improvement (avoids a lot of expensive PHP function calls), but it should not matter here since this is not called very often anyway.

We really need to update the documentation page on dynamic queries, since in 90% of the cases db_select() should not be used. http://drupal.org/node/310075

geerlingguy’s picture

Some test results (All tests were averaged over three runs on a site with 10 entity types and 101 field instances.)

Before patch:

cache_field table size: 272KB
Node listing page (with about 50 nodes): 509ms / 15.75MB RAM
Node listing page cache clear: 5676ms / 37.75MB RAM
Single node page: 342ms / 15.25MB RAM

After patch:

cache_field table size: 1.5MB
Node listing page (with about 50 nodes): 543ms / 14.5MB RAM
Node listing page cache clear: 7369ms / 39MB RAM
Single node page: 341ms / 13MB RAM

No problems with the patch (this is run on a dev instance of a live site), and the memory savings are apparent. The timings fluctuated a bit, so the accuracy isn't that great. Definitely no regression, except during a cache clear (actually, the rebuild is what takes most of the time).

Leaving at RTBC, and will do a quick re-roll with the query quoted above reworked into db_query() (I agree that there's no reason for db_select() on that one).

geerlingguy’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
38.7 KB

Attached patch converts the query from field.info.class.inc to db_query(). I was going to format it like below, but it seems the core convention is to spell out the entire query on one super long line, so that's how I made the patch. Is there any guidance on this?

    $query = db_query('SELECT fc.type, fci.field_name, fci.entity_type, fci.bundle
      FROM {field_config_instance} fci
      LEFT JOIN {field_config} fc ON fc.id = fci.field_id
      WHERE fc.active = 1
        AND fc.storage_active = 1
        AND fc.deleted = 0
        AND fci.deleted = 0');

[Edit: According to berdir in IRC, there's no specific guideline, but single-line queries are normal. Just goes against my OCD tendencies :)]

bojanz’s picture

The latest patch is missing some files.

Tested #201 on a fresh Kickstart 2.0 install.

Before:
All products view: Page execution time was 1548.19 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=18.2 MB, PHP peak=18.75 MB.
Frontpage: Page execution time was 676.57 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=14.7 MB, PHP peak=15 MB.
admin/content view: Page execution time was 551.83 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=13.86 MB, PHP peak=14.25 MB.

After:
All products: Page execution time was 1805.92 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=17.81 MB, PHP peak=18.25 MB.
Frontpage: Page execution time was 649.89 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=13.69 MB, PHP peak=14 MB.
admin/content: Page execution time was 580.78 ms. Memory used at: devel_boot()=0.81 MB, devel_shutdown()=12.81 MB, PHP peak=13 MB.

Execution times vary wildly and shouldn't be trusted. Memory savings were consistent.
I should also note that Commerce based sites will see a bigger improvement once we commit #1809916: Decrease memory usage by loading fields more selectively to Commerce

I've also tested installing the distro with the patch applied. Found no problems.

geerlingguy’s picture

Yikes! Thanks for catching that, bojanz, forgot to add files to staging area before creating patch. Attached patch should be complete.

yannisc’s picture

Applied patch #210 cleanly. I get no errors or warnings. For a view with a lot of block views, I got 1.58 reqs/sec with the patch, instead of 1.61 reqs/sec without the patch. No cache, no APC enabled.

catch’s picture

A view's not a particularly good test case for this. Just a node/n page generally works best to see the difference.

alexrayu’s picture

Tested #210 in WAMP. Site with lots of views, and field collections having fields in each. The patch yielded about 2% memory usage improvement, not distinct database usage improvement, but very high CPU usage resulting in about 1.5 longer page load time. Maybe it's WAMP-specific though.

As for errors and API, all went smooth, the patch brought no errors or any problems in actual functionality or API compatibility.

yannisc’s picture

In a node page with 5 fields, I got from 4.01 reqs/sec to 4.09 reqs/sec, logged in as admin.

slashrsm’s picture

Tested #210 at fairly big media site (195 field instances, 70k articles). Mem usage on frontpage (a LOT of content) went down for ~2,5% and on node page for ~5,5%. CPU usage is slightly lower (but not significant).

Applied cleanly, no errors appeared.

Site uses quite a lot contrib/custom modules. Another level of improvement could be achieved by updating their code too. Did not test that though.

heddn’s picture

Status: Needs review » Needs work

Tested #210 on front page of a news media site. Memory usage on front page went down ~12%. Page execution time (single user testing) were roughly 25-100ms faster on page refreshes. There's a lot of external js, etc that gets loaded so page execution time varied widely but was consistently faster.

old-side - page refresh
Memory used at: devel_boot()=5.69 MB, devel_shutdown()=41.37 MB, PHP peak=42 MB.
old-side - cache clear
Memory used at: devel_boot()=6.01 MB, devel_shutdown()=69.7 MB, PHP peak=91.5 MB.

new-side - page refresh
Memory used at: devel_boot()=5.69 MB, devel_shutdown()=36.58 MB, PHP peak=37.25 MB.
new-side - cache clear
Memory used at: devel_boot()=6.01 MB, devel_shutdown()=69.11 MB, PHP peak=90.75 MB.

Marking as needs work though because I tried testing several nodes. The body content of a node is universally empty. Not good. By the time we get to hook_node_view(), it doesn't exist. Not sure what in this patch caused the behavior but removing the patch causes the site to return to normal.

heddn’s picture

Status: Needs work » Needs review

Bad me, we also apply #955658-42: Undefined index: module in FieldInfo::prepareInstanceDisplay() and I'd gone ahead and I applied it after applying this patch. I didn't apply it correctly. Moving back to needs review.

becw’s picture

#210 works for me but provides limited reductions in overall memory usage (.75 to 1 MB) on a site of medium complexity. For example:

Page execution time Memory at devel_boot() Memory at devel_shutdown() PHP peak
Unpatched, after cache clear
4147.34 ms 7.67 MB 117.67 MB 118.5 MB
4121.78 ms 7.67 MB 117.67 MB 118.5 MB
4278.69 ms 7.67 MB 117.67 MB 118.5 MB
3962.14 ms 7.67 MB 117.67 MB 118.5 MB
4331.27 ms 7.67 MB 117.67 MB 118.5 MB
Unpatched, cached load
1067.98 ms 7.68 MB 82.7 MB 83.25 MB
831 ms 6.7 MB 82.7 MB 83.25 MB
796.29 ms 6.7 MB 82.69 MB 83.25 MB
791.46 ms 6.7 MB 82.69 MB 83.25 MB
799.9 ms 6.7 MB 82.69 MB 83.25 MB
Patched, after cache clear
4904.86 ms 7.64 MB 117.65 MB 118.75 MB
4626.93 ms 7.67 MB 117.68 MB 118.75 MB
4728 ms 7.67 MB 117.68 MB 118.75 MB
4194.17 ms 7.67 MB 117.68 MB 118.75 MB
4484.48 ms 7.67 MB 117.68 MB 118.75 MB
Patched, cached load
924.34 ms 7.68 MB 81.84 MB 82.25 MB
821.19 ms 6.7 MB 81.84 MB 82.25 MB
812.45 ms 6.7 MB 81.84 MB 82.25 MB
803.8 ms 6.7 MB 81.84 MB 82.25 MB
794.72 ms 6.7 MB 81.84 MB 82.25 MB

I tested the patch on a node page of a site with 70 fields and 136 field instances. I ran each page several times, since there were minor variations in memory usage and I wanted to watch for changes in the execution times. It seems like this patch will have a limited performance impact on sites of this size.

Thinking this was due to the small size, I ran the my test script on a node page of a larger site (which admittedly has some other performance issues). This site has 125 fields and 252 field instances; applying the patch increased memory usage by a small amount:

Page execution time Memory at devel_boot() Memory at devel_shutdown() PHP peak
Unpatched, after cache clear
7659.03 ms 10.33 138.79 140.5
6586.01 ms 10.33 138.79 140.75
7376.61 ms 10.33 138.79 140.75
6623.6 ms 10.33 138.79 140.5
6483.26 ms 10.33 138.79 140.75
Unpatched, cached load
1605.66 ms 10.37 96.05 96.5
1411.15 ms 9.84 96.05 96.5
1473.85 ms 9.84 96.05 96.5
1418.82 ms 9.84 96.05 96.5
1441.56 ms 9.84 96.05 96.5
Patched, after cache clear
8277.51 ms 10.33 139.2 141
8293.89 ms 10.33 139.2 141
9005 ms 10.33 139.2 141
7630.76 ms 10.33 139.2 141
8034.94 ms 10.33 139.2 141
Patched, cached load
1544.95 ms 10.37 96.2 96.75
1446.83 ms 9.84 96.2 96.75
1469.54 ms 9.84 96.2 96.75
1477.64 ms 9.84 96.2 96.75
1462.12 ms 9.84 96.2 96.75

Is the intent of this patch to reduce the memory usage of _field_info_collate_fields() specifically, or of Drupal overall? Because while it may be succeeding at the first, it does not have a significant positive impact on the overall memory usage in the tests I have run.

mariusdiacu’s picture

I tried the patch with a content type with about 60 fields and 12 field collection with 4-5 fields in or even field collection in field collection. Cache is on memcached. For me the improvements was really small.

Berdir’s picture

For those not seeing improvements despite having quite a few fields, you might have contrib/custom modules which are calling field_info_instances() without type and bundle or field_info_fields(). (Like commerce does currently, as pointed out be @bojanz)

Right now, that doesn't matter, everything is loaded anyway, so that call is almost free. However, with the patch, that changes and that information is no longer kept in a global cache and needs to be rebuilt every time it is requested. I think that is what even slows down some sites.

yched’s picture

Thanks to the many people that provided feedback here !

I was just about to post exactly what @Berdir #220 said. The memory gains will be mitigated/nullified by "give me all" field_info_[instances|fields]() calls in contrib module:
- field_info_instances() with $bundle param == NULL
- field_info_fields()
The whole point of the patch is to only load stuff that is needed - i.e the list of fields and instances on a specific bundle.
However, calls to one of the above loads everything in memory, and statically persists it, as in current D7, so as to not degrade performance of existing code. The only difference is that the "whole" info is not automatically loaded in memory on each page request anymore, but read from cache and statically persisted on first call.

The real use cases for field_info_instances("all") are not that frequent, but since the info was available at no cost so far, contrib modules liberally used it instead of more targeted calls.

There's not much that can be done about that, aside from slowly fixing contrib modules to get rid of those calls.
Most uses of field_info_instances("all") can generally be avoided, either:
- by using a slightly different logic (restricting to a series of specific bundles),
- or by using the newly added field_info_field_map(), which provides a lightweight map of fields across entity bundles. Contrib modules would then probably need to use an if (function_exists('field_info_field_map')) { optimized version } else { old, non optimized version } construct to preserve compatibility with both pre-patch & post-patch core codebases.

tsvenson’s picture

@yched: (thinking out loud) Would it be possible to task the coder module to help identifying and maybe even assist on rewriting contribs that uses the wrong functions?

DamienMcKenna’s picture

@tsvenson: The issue summary already includes sample code to that should be used to update existing modules, we just need to identify modules that use the old code pattern and submit patches to improve them.

DamienMcKenna’s picture

I just looked through some modules I had checked out locally and found the following:

  • Many modules use the old code pattern as part of their installation routines or update hooks (Media update 7201, AddressField update 7000, Brightcove update 7201, Commerce 'price' submodule 7100, Filefield Paths 7107).
  • The following all used "field_info_fields()" in one way or another: Album Uploader, Commerce, Commerce Devel, CTools, Date, DefaultContent,, Drush, Features (v1 and v2), Field Collection, File Entity, FileField Paths, Location, Mast, Simple Contest, Token, Views.

In almost all cases the usage was quite simple and it wouldn't take much effort to update them.

brianV’s picture

To add to the list in 224 from looking at one of our larger sites:

Invoke field_info_fields():

Apachesolr, Addressfield, ctools, custom_formatter, date, diff, entityreference, field_collection, file_entity, media, geofield, media, microdata, nodequeue, token, views.

Invoke field_info_instances() w/ no arguments or null $bundle:

Acquia Connector, Apachesolr, calendar, ctools, date, devel, entityreference, microdata, style_usage, token, views

Maybe we can create a canonical list, and use it to create a meta-issue to track module-specific issues that address this?

brianV’s picture

I've begun a Google doc at https://docs.google.com/spreadsheet/ccc?key=0AjNi8SoLpdaQdEFjaV9jY3ZwYkl... which tracks the modules reported which need to be adjusted to take advantage of this patch. So far, includes entries from #224 and #225.

Please update it with any modules you are aware of which will need to be updated to take advantage of this patch.

becw’s picture

I don't see broad usage of field_info_fields() and field_info_instances() without arguments in the contrib modules that are installed on the sites I tested against. On the first site, the following modules call one of those functions without arguments outside of the install hooks:

  • ctools
  • date
  • devel
  • entity
  • features
  • feeds
  • location
  • metatags_quick
  • rules
  • token
  • views

These modules have one or more calls to the field_info_(fields|instances) functions without arguments, but none of the calls are in the execution path on my node pages.

becw’s picture

Issue summary: View changes

formatting

yched’s picture

@becw: field_info_instances($entity_type) (one single argument) is also impacted.
Just as a quick test, I'd suggest adding this at the beginning of field_info_instances() :

if ($bundle_name === NULL) {
  dpm('Expensive call to field_info_instances()');
  return array();

and re-test the "patched, cached load" part - some stuff might stop working as they should of course, but just to get an idea.
Thanks for the extensive test, BTW !

@DamienMcKenna, @BrianV:
Sounds like an excellent idea !

Bottomline is: there should be very few real cases for iterating on the list of *all* fields and instances in a "regular" page flow (maybe less true while rebuilding some cached data or participating in an info hook).

Cheap:
- field_info_field_map()
- field_info_instances($entity_type, $bundle)
- field_info_field($field_name) & field_info_field_by_id($field_id), if the field appears in a bundle on which field_info_instances($entity_type, $bundle) has already been called - otherwise it's O(n) on the number of different fields loaded.
(i.e things were optimized for the most common pattern of :
foreach (field_info_instances($type, $bundle) as $instance) { $field = field_info_field($instance['field_name']); } )
- field_info_instance($entity_type, $field_name, $bundle) - same as above, cheap if comes after field_info_instances($entity_type, $bundle), O(n) if not.

Costly:
- field_info_fields(), field_info_field_by_ids()
- field_info_instances(), field_info_instances($entity_type) - extra costly, also loads field data.

We might end up completely removing/renaming the "costly" functions in D8 to make that clearer, but of course that's not an option for D7 :-(

tim.plunkett’s picture

I used the following to find field_info_instances() and field_info_instances($entity_type):

grep -nr "field_info_instances([^),]*)" *

DamienMcKenna’s picture

@tim: Nice regex :)

I identified a few more modules with that: Display Suite, References Manager (that's mine), UserReference URL.

DamienMcKenna’s picture

So it seems like we honestly need to first patch a few modules and then compare the numbers.

bojanz’s picture

Status: Needs review » Reviewed & tested by the community

Let's start by getting the patch committed.

We've all confirmed that it works correctly and provides memory savings even without contrib patching.

yched’s picture

Status: Reviewed & tested by the community » Needs review

So it seems like we honestly need to first patch a few modules and then compare the numbers

I don't think so. We're not trying to benchmark the patch and see if the changes make sense. This has been established already, and I don't think we can squeeze more perf gains there or take a different approach.

The situation is :
- the patch (strongly) optimizes the good (and most frequent) cases.
- there are some real use cases that cannot be optimized (those that need to iterate the full list of fields or instances), but those should not be the majority, and the patch shouldn't make them perform worse.
- there is a (possibly non-minor) amount of contrib code that currently rely on that non-optimizable logic, and some / most of it could be refactored to use more performant logic.

yched’s picture

Status: Needs review » Reviewed & tested by the community

crosspost :-)

David_Rothstein’s picture

I looked at the interdiff between #144 and #210 and it looks great. Just a couple comments; none are commit blockers except (possibly) the first one:

  1. -    $query = db_select('field_config_instance', 'fci');
    -    $query->join('field_config', 'fc', 'fc.id = fci.field_id');
    -    $query->fields('fc', array('type'));
    -    $query->fields('fci', array('field_name', 'entity_type', 'bundle'))
    -      ->condition('fc.active', 1)
    -      ->condition('fc.storage_active', 1)
    -      ->condition('fc.deleted', 0)
    -      ->condition('fci.deleted', 0);
    -    foreach ($query->execute() as $row) {
    +    $query = db_query('SELECT fc.type, fci.field_name, fci.entity_type, fci.bundle FROM {field_config_instance} fci LEFT JOIN {field_config} fc ON fc.id = fci.field_id WHERE fc.active = 1 AND fc.storage_active = 1 AND fc.deleted = 0 AND fci.deleted = 0');
    +    foreach ($query as $row) {
    

    Why did this switch from an inner join to a left join? (In practice it might not matter much, but inner join looked correct here.)

  2. -    $this->assertTrue(isset($fields[$field_name]), 'The test field is initially found in the array returned by field_info_fields().');
    +    $this->assertTrue(isset($fields[$field_name]), t('The test field is initially found in the array returned by field_info_fields().'));
    

    The new patch seemed to introduce lots of these (both in new code and code that already existed in the earlier patch)... we should clean that up in a followup.

  3. + * See http://drupal.org/node/1040790.
    

    Everywhere the patch says that, we should probably switch to refer to the change notification (once one exists).

***

Meanwhile, as for all the testing people have been doing, that looks really great. I get the overall impression that the patch is behaving as expected, but there are a few outliers in some of the reports above...

It actually seems like it's been a long while since anyone did any "official" benchmarking (like with ApacheBench, to get statistically significant results) in this issue; not sure if anyone has done any on a recent patch. I don't think it's a requirement for commit necessarily, but I'd be very curious (and perhaps curious enough to do it myself) what the results are of a few "theoretical" extremes. For example:

  1. Single node page view on a site with lots of content types containing lots of unique fields.
  2. Single node page view on a site with a single content type and a moderate number of fields.
  3. Same as #1, but with core hacked to call field_info_fields() and field_info_instances() on every page request (simulating a contrib module that might do that).

My understanding is we expect the patch to provide a big memory improvement for #1 (and maybe even an improvement in page execution time too) and not much of an effect either positive or negative for #2 and #3...

David_Rothstein’s picture

While looking this over I also tried to track things that aren't in D8 yet, since we should make sure there are followups for all of them.

Here's what I found:

  1. -    $query = db_select('field_config_instance', 'fci');
    -    $query->join('field_config', 'fc', 'fc.id = fci.field_id');
    -    $query->fields('fc', array('type'));
    -    $query->fields('fci', array('field_name', 'entity_type', 'bundle'))
    -      ->condition('fc.active', 1)
    -      ->condition('fc.storage_active', 1)
    -      ->condition('fc.deleted', 0)
    -      ->condition('fci.deleted', 0);
    -    foreach ($query->execute() as $row) {
    +    $query = db_query('SELECT fc.type, fci.field_name, fci.entity_type, fci.bundle FROM {field_config_instance} fci LEFT JOIN {field_config} fc ON fc.id = fci.field_id WHERE fc.active = 1 AND fc.storage_active = 1 AND fc.deleted = 0 AND fci.deleted = 0');
    +    foreach ($query as $row) {
    

    Doesn't seem to be in D8... is there a followup issue yet?

  2. diff -u b/modules/field/field.api.php b/modules/field/field.api.php
    --- b/modules/field/field.api.php
    +++ b/modules/field/field.api.php
    @@ -1717,11 +1717,6 @@
      * objects in $entities. Fields with no values should be added as empty
      * arrays.
      *
    - * By the time this hook runs, the relevant field definitions have been
    - * populated and cached in FieldInfo, so calling field_info_field_by_id() on
    - * each field individually is more efficient than loading all fields in memory
    - * upfront with field_info_field_by_ids() (which is uncached).
    - *
      * @param $entity_type
      *   The type of entity, such as 'node' or 'user'.
      * @param $entities
    @@ -1743,6 +1738,10 @@
       $load_current = $age == FIELD_LOAD_CURRENT;
     
       foreach ($fields as $field_id => $ids) {
    +    // By the time this hook runs, the relevant field definitions have been
    +    // populated and cached in FieldInfo, so calling field_info_field_by_id()
    +    // on each field individually is more efficient than loading all fields in
    +    // memory upfront with field_info_field_by_ids() (which is uncached).
         $field = field_info_field_by_id($field_id);
         $field_name = $field['field_name'];
         $table = $load_current ? _field_sql_storage_tablename($field) : _field_sql_storage_revision_tablename($field);
    
    .....
    
    diff -u b/modules/field/modules/field_sql_storage/field_sql_storage.module b/modules/field/modules/field_sql_storage/field_sql_storage.module
    --- b/modules/field/modules/field_sql_storage/field_sql_storage.module
    +++ b/modules/field/modules/field_sql_storage/field_sql_storage.module
    @@ -327,6 +327,10 @@
       $load_current = $age == FIELD_LOAD_CURRENT;
     
       foreach ($fields as $field_id => $ids) {
    +    // By the time this hook runs, the relevant field definitions have been
    +    // populated and cached in FieldInfo, so calling field_info_field_by_id()
    +    // on each field individually is more efficient than loading all fields in
    +    // memory upfront with field_info_field_by_ids() (which is uncached).
    

    None of this seems to be in D8... is there a followup issue yet?

    By the way... is "which is uncached" actually correct here?

  3. +  /**
    +   * Tests that the field info cache can be built correctly.
    +   */
    +  function testFieldInfoCache() {
    

    As mentioned in an earlier comment, doesn't seem to be in D8 yet. I'm just going to repurpose #1400256: Tests for: $info['fields'], not always set? for that after the patch here goes in.

hass’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs change record
geerlingguy’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs change record
FileSize
7.42 KB
60.41 KB
  1. INNER to LEFT join - was completely unintentional. Fixed in attached patch.
  2. Test assertions with t() wrappers - looks like that leaked in at some point earlier in the issue. I've remove the t() from them (note that they still exist elsewhere in that test file).
  3. Created change notification: http://drupal.org/node/1880666 (it's not complete, so we need to complete it after the patch is committed).

As far as the D8 issues go, I've opened #1880682: Convert dynamic query in FieldInfo::getFieldMap() to static query for the query, but maybe we could wrap the other changes in another issue?

Jorrit’s picture

andypost is right in #5: file_get_file_references() needs to be rewritten. What it basically needs is a field_info_field_by_type() kind of function to retrieve all fields of a certain type. I also need this for the Video module, where I do something similar as file_file_download() in video_file_download().

yched’s picture

- Thanks @geerlinguy for the fixes / rerolls !

re @David_Rothstein #236:
- Correct, the "(which is uncached)" comments are not relevant anymore.
Fixed in attached patch.
- Will create a D8 issue to forward-port the other bits.

re @Jorrit #239:
- file_get_file_references() : Right. Created #1880884: Optimize field / instance loading in file_get_file_references() .
- field_info_field_by_type() : I'm not sure we'll provide that, but the new field_info_field_map() would get you 90% of the way there.

yched’s picture

Crap, forgot the patch.
Interdiff with #238.

Jorrit’s picture

#240: field_info_field_map() is sufficient, you're right about that. The changes are small.

marcingy’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs change record

Have tested this on a large drupal site with many nodes and with 295 fields and 657 field instances and am seeing a 5%-10% reduction in memory usage. No other issues with the site were uncovered so this certainly seems to do what it says on the tin.

yched’s picture

Status: Needs review » Reviewed & tested by the community

back to RTBC ?

mgifford’s picture

David_Rothstein’s picture

Title: _field_info_collate_fields() memory usage » Change notice: _field_info_collate_fields() memory usage
Category: bug » task
Priority: Major » Critical
Status: Reviewed & tested by the community » Active
FileSize
599 bytes
1.25 KB
1011 bytes
1.06 KB
1.26 KB
1.26 KB
1.26 KB
1.26 KB
1.26 KB
1.26 KB

OK, this looks good - thanks everyone! Committed to 7.x: http://drupalcode.org/project/drupal.git/commit/bd1dc1f

This needs the placeholder change notification at http://drupal.org/node/1880666 filled in.

***

To be extra certain this was OK, I decided to run benchmarks for the scenarios I listed in #235 before committing this, specifically:

  1. Single node page view on a site with 20 content types containing 20 unique fields per content type.
  2. Single node page view on a site with a single content type containing 20 fields.
  3. Same as #1, but with core hacked to call field_info_fields() and field_info_instances() on every page request (simulating a contrib module that might do that).

All the scripts I used and the ApacheBench results are attached to this comment. I was getting some large standard deviations but overall the patch seems to behave as desired. Scenario 2 did have a statistically significant increase in page execution time of a few percent, while the others either stayed the same or got better. Memory results, meanwhile, were consistent with large improvements that others have reported for scenario 1 (PHP peak memory dropped from 33.25 MB to 24 MB with the patch) and no real change for either scenario 2 or scenario 3.

So, I felt this was committable because the kinds of sites that are likely to care about a few percentage points page execution time increase (as seen in scenario 2) are also least likely to find themselves in that scenario in the first place (they are more likely to be large and complex and therefore benefit the most from the other improvements in this patch). Overall, therefore, it seemed like a big win for Drupal 7.

We still have over a week to go until the next release window (February 6) in case anyone finds anything wrong with this in the meantime; I'm also toying with just waiting until March (for other reasons besides this issue, but for this issue it would give some time for people to write patches for contrib modules to optimize their behavior for this change and get more sites over to something resembling scenario #1).

Thanks again!

David_Rothstein’s picture

As for things that stil need to be forward-ported to Drupal 8, we have issues for two of them:
#1400256: Tests for: $info['fields'], not always set?
#1880682: Convert dynamic query in FieldInfo::getFieldMap() to static query

I think we still need an issue for porting #236.2 to Drupal 8 though (with the "which is uncached" part removed).

yched’s picture

Title: Change notice: _field_info_collate_fields() memory usage » _field_info_collate_fields() memory usage
Category: task » bug
Priority: Critical » Major
Status: Active » Fixed

Sorry for the delay, been stuck out of the issue queue lately, slowly getting back.
Yay for the commit !

Posted a change notification in http://drupal.org/node/1915646.
I tried to be as clear and precise as I could, but the result is probably too verbose - could use a proof-reading and possibly some edits.

[edit: deleted previous record (swentel)]

podarok’s picture

Yay!
At last!!!

Thanks!!!!!

yched’s picture

Updating tags.

yched’s picture

Ouch - the placeholder change notification node was linked in several place in the code.
But I superbly ignored it, created a new change notification, and we deleted the old one.

To avoid adding confusion here, patch posted in #1920340: Update URL of change notification for "_field_info_collate_fields() memory usage".

David_Rothstein’s picture

Thanks, I'll get that one committed soon!

The change notification looks pretty good to me. I went through it a little earlier and changed all references from Drupal 7.20 to Drupal 7.21 (since Drupal 7.20 was a security release only).

Making the same change here too...

David_Rothstein’s picture

Hm, thought I changed the tag... trying again.

yched’s picture

@David_Rothstein :

I think we still need an issue for porting #236.2 to Drupal 8 though (with the "which is uncached" part removed).

Yup, posted a D8 patch in #1922892: Last missing bit from D7 "_field_info_collate_fields()" patch

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

David_Rothstein’s picture

Issue tags: +7.22 release notes

Fixing tags since Drupal 7.21 only contained a fix to deal with fallout from the Drupal 7.20 security release. Hopefully this is really the last time I do this and this will be released in Drupal 7.22 for real :)

David_Rothstein’s picture

.

pounard’s picture

Something bothers me a bit with this patch: in certain edge cases (complex sites with *a lot* of fields) it makes *a lot* of cache backend queries. Don't bother trying to fix this, in this case I guess memory is more important than the cache backend. I just wanted to point that this optimisation does have a downside. I just wanted to say it.

yched’s picture

@pounard: yes, it's been clear from the start that there would be tradeoffs - there has to be.

Not sure what qualifies as "*a lot* of cache backend queries", but the code that got committed results in one cache query per entity bundle involved in the page request (instead of one single cache query previously). That's not strictly related to the number of overall fields & instances on the site, but rather to the "complexity" of the page (as in # of different "kinds of stuff" loaded on for the page).

For sites with many fields and instances the memory gain should definitely outweight the cache reads. The bad case would be pages involving lots of different entity types & bundles, on sites with a moderate # of fields, but this was deemed a reasonable tradeoff - i.e the perf loss in that case would be limited, and more likely happen on sites where performance is probably not too much of a critical issue, while the perf gains are huge for sites with lots of fields.

pounard’s picture

The bad case would be pages involving lots of different entity types & bundles, on sites with a moderate # of fields

Here is a simple scenario: using profile2 to store complex user profiles, with many profiles carrying many fields that ends being loaded because you have an authenticated user, viewing a node with some node references or entity references (or simply a product page using commerce) with a cart block on the page and some other nice blocks showing entities: you can easily end up loading half of the fields on the same page. This is actually not rare, but a pretty common Drupal scenario.

But yes, I understand the tradeoff, and that's still a good one.

plach’s picture

I think the following snippet from the change notice

<?php
foreach (field_info_field_map() as $field_name => $data) {
  if ($data['type'] == 'text') {
    $field = $fields[$field_name]; // No $fields defined!
    foreach ($data['bundles'] as $entity_type => $bundles) {
      foreach ($bundles as $bundle) {
        $instance = field_info_instance($entity_type, $field_name, $bundle);
        // Do something with the $field and $instance
      }
    }
  }
}
?>

should become

foreach (field_info_field_map() as $field_name => $data) {
  if ($data['type'] == 'text') {
    $field = field_info_field($field_name);
    foreach ($data['bundles'] as $entity_type => $bundles) {
      foreach ($bundles as $bundle) {
        $instance = field_info_instance($entity_type, $field_name, $bundle);
        // Do something with the $field and $instance
      }
    }
  }
}

Please, confirm :)

yched’s picture

Indeed. Fixed. Thanks !

hass’s picture

hass’s picture

Issue summary: View changes

Add link to Google spreadsheet which tracks modules to be updated to take advantage of this patch.