Problem/Motivation

views_fetch_data get the views table information. To not run all the hooks all the time there is a cache in
the database with the cid 'views_data' , which is loaded on every page with a view.

The problem is that the result is getting really big really fast, so on some sites
it is over 10MB.

Proposed resolution

Most of the time you don't have to load all the information, but just the information from one certain table.

If the $table parameter is specified just load the data from a cache specific for the single table.
At the same time write one additional cache entries for each tables, but keep the one for all
tables.

Remaining tasks

  • General documentation of the new code
  • If it is possible tests.

Original report by [username]

I'm working on a site with dozens of views and a need to scale to hundreds of near-simultaneous authenticated users. One of the memory hogs we're seeing in xhProf is the cache object loaded by views_fetch_data() and views_cache_get(‘views_data:en’), which seem to run every time any view is loaded on the page. That cache object (loaded from memcache) takes up 11 MB per request, so at the scale we want to reach, that's multiple gigabytes just for that item.

... some more, just asking why this is done.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

merlinofchaos’s picture

This is unfortunately normal. For a view to be built, Views has to know all of the data it might pull from, and it stores it as one big bucket -- really it's the result of hook_views_data().

This is a known issue with Views but there's really no way to change this without some pretty significant work and most of the attempts to save memory would come with increases in cache loading overhead and ultimately might not end up saving that much.

dawehner’s picture

In php5 we could do some kind of pseudo array object, which lazy-loads the needed data per table.
To be able to do this the cache entries would have to be per base table.

The problem though is that if you have a view with many different tables / fieldapi fields, you end up with many cache_get's which is also bad, if not worse.

dawehner’s picture

Just had another idea: Store the views data per view.

It should be probably possible to find out all fields of views_data which are needed for a certain view.
There are probably problems with things like very custom filters/handlers, but maybe some kind of api could be exposed.
Based on this a cache entry per view would be stored, which should be MUCH smaller then a full one.

Maybe these two ideas could be combined as well.
In general i doubt that such kind of work will happen on 2.x...

swentel’s picture

Version: 6.x-2.9 » 7.x-3.x-dev
Category: support » feature
Status: Active » Needs review
FileSize
102.17 KB
72.46 KB
2.57 KB

I've been experiencing the same exact issue with a site I was profiling for heavy memory usage. views_fetch_data() was taking up 15mb per request, sometimes just to show only one single view. I've attached a patch which changes the _views_fetch_data() function storing every data per table in a single cid and an extra cid called 'base_tables', only storing info about the base tables. The patch now only loads the views data for the view it needs on that request. While this seems overhead, we can (safely?) assume that in case you have much views (cf underneath), you won't probably put them all on one page. If so, then I'd rather think there's a structural site building problem. And with memcache you can also make your database server even more happier. And with the patch, everything seems to be working still nicely.

Context of the site:
- 26 views, 37 displays
- 73 fields used in views
- 138 fields
- 219 field instances
- 26 content types

The memory drop underneath is done on a page which loads 3 views every single request (it's looking for nid in the url to create 'related' info blocks).

Total memory before patch: 34.65 MB, PHP peak=35.5 MB.
Total memory after patch: 21.87 MB, PHP peak=22.5 MB.

Here are the xhprof screens.

Before

1421844-before-patch.png

After

1421844-after-patch.png.png

Let me know what you guys think. I haven't tested all possibilities yet, hope the testbot is happy as well :)

Status: Needs review » Needs work

The last submitted patch, 1421844-4.patch, failed testing.

swentel’s picture

Ok, so it looks like there should be one cache entry with all data for the wizards, will look further tomorrow.

swentel’s picture

Status: Needs work » Needs review
FileSize
2.49 KB

Ok, this patch should fix all tests.

Important: Views UI needs to be disabled to fully profit from this patch. Haven't traced it completely, but it looks like when looking at a view (eg frontpage), views needs all data, looks like it's loaded somewhere in admin.inc. Haven't run new xhprof tests, but it should be even a little bit better.

swentel’s picture

Ok, great, however, this won't work all the time. I'm looking at a page now managed by page manager where all data is asked. Going to trace and see if we might need additional cid's in the database or not with only the info that is needed for a particular function.

swentel’s picture

Traced it: the full request was coming from entity_views_plugins() from entity module which simply wants to know whether a table had an 'entity type' key and 'base key'. In this particular site, this would have saved me 4MB of cache, now entity module is causing 4 MB of cache for not a lot of data it wants (this could easily be a simple variable which could be cleared on full cache or so (it already implements hook_flush_caches so it could do that).

tim.plunkett’s picture

So it seems that anyone calling views_fetch_data() with no key/table should wrap that themselves?

Grepping through my checkouts of D7 modules, the following modules are doing that:

data: data_get_views_handler_options()
entity: entity_views_plugins()
entityreference_view_widget: entityreference_view_widget_views_plugins()
search_api: search_api_views_update_7101()
searchlight: SearchlightDatasource::getDatatype()
sparql_views: sparql_views_plugin_query_sparql::init()
views_arguments_extras: views_sort_by_arg_order_handler_sort::options_form()

7 out of 3032 modules, one of which is an update hook, makes it seem to me like it should be their problem, not ours.

As far as the patch itself, it makes sense, but should the function signature of views_fetch_data() change to match?

tim.plunkett’s picture

Status: Needs review » Needs work

CNW to make sure that the function signatures of views_fetch_data() and _views_fetch_data() match.

bdragon’s picture

Status: Needs work » Needs review
FileSize
4.63 KB

Opened #1558340: Scalability: Cache base table list to avoid calls to views_fetch_data() to address entity's calling of views_fetch_data().

I was having corruption problems when exporting views using drush. Missing handlers, etc.

I rewrote the patch in a somewhat naive manner (duplicate code blocks that I'm sure could be merged together, but I was just trying to get it to work properly.)

My version does not use 'all' as the all tables entry, it uses the views_data: entry like the original code.

Setting to needs review for code review, but it probably won't be accepted without reworking a bit to not have duplicated lines of code.

bdragon’s picture

Status: Needs review » Needs work

I'm still having trouble with entity. :(

I guess it could be a recursion thing.

bdragon’s picture

Status: Needs work » Needs review
FileSize
4.81 KB

OK, was silly, I wasn't keeping track of whether the cache was fully loaded or not.

Status: Needs review » Needs work

The last submitted patch, 1421844-speedup-views-fetch-data_fixed.patch, failed testing.

bdragon’s picture

Status: Needs work » Needs review
FileSize
5.45 KB

Updating tests.

Status: Needs review » Needs work

The last submitted patch, 1421844-speedup-views-fetch-data_fixed_tests.patch, failed testing.

catch’s picture

Title: Scalability of views_fetch_data() » views_fetch_data() cache item can reach over 10mb in size
Category: feature » bug

Moving this to a bug report, I'm seeing this cache item take around 8mb on a Drupal 6 site I'm profiling at the moment.

Items bigger than 2mb potentially fall foul of max allowed packet or memcache default slab size (which is actually 1mb, but the memcache contrib project compresses large items so in practice you can get away with much bigger).

catch’s picture

Status: Needs work » Needs review
FileSize
5.56 KB

OK the test update was right, but the test that was failing wasn't using that helper which is why it still failed. That particular test now passes for me with this patch, didn't run the full suite locally.

catch’s picture

Started to look at a backport then realised there's a fair bit of code duplication between the if ($table)/else { code paths, moved that to a helper.

catch’s picture

Non-empty patch.

Status: Needs review » Needs work

The last submitted patch, 1421844_views_fetch_data_20.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
4.69 KB

OK not quite, this should fix the test failure.

catch’s picture

Here's 6.x-3.x and 6.x-2.x backports, untested as yet.

catch’s picture

Sorry, better 2.x backport that actually applies..

catch’s picture

OK one more change. I noticed the individual cache entries are getting set before the bigger one. This would mean a potentially long window where specific tables aren't cached at all, so it'd be easy to end up with a cache stampede.

For now I put the large cache entry saving first, so at least if an individual item isn't set yet, the larger one can be picked up. To avoid setting such a large number of items all at once, it'd be necessary to use DrupalCacheArray or a custom on-demand mechanism like #853864: views_get_default_view() - race conditions and memory usage does, however I'm unlikely to have time to work on that at least this week.

Also found a couple of bugs in the 6.x patches while testing, so uploading new versions of all three. That should be the end of spamming this issue for a bit hopefully.

fenda’s picture

I posted this comment on this issue too: #853864: views_get_default_view() - race conditions and memory usage

Might be related to both.

Perhaps my situation is related.

On occasion (cache clearing, module page form save, add a field) my database is returning this error in the mysql log:

120606 5:32:10 [ERROR] Got error -1 when reading table './cyac/cache_views'

I then must run /etc/init.d/mysqld restart for my site to become responsive.

Did a successful cache clear 10 minutes and checked cache_views. views_data:ja and views_data:en are showing 500kb+.

Any 'quick fix' for this? Happening on a production server fairly often.

dawehner’s picture

To understand the issue 100% i wrote an issue summary, please complete it if there is something missing.

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

Some extremely encouraging testing results (avg. of 3 runs):

Before Patch

54 rows in {cache_views}; 5.5MB total data size

Flush caches page load: 33871ms, 185.33MB peak RAM
Cached page load: 681ms, 34MB peak RAM

After Patch

745 rows in {cache_views}; 11.5MB total data size

Flush caches page load: 38089ms, 185.5MB peak RAM
Cached page load: 581ms, 17.5MB peak RAM

Differences

Cache clear: +4218ms execution time, +.17MB memory
Cached page load: -100ms execution time, -16.5MB memory

So, for normal use, there's a 15% improvement in execution time, and 49% improvement in memory usage.

Between this patch and the one in #1040790: _field_info_collate_fields() memory usage, a site with a bunch of views, entity bundles, and fields that used to use ~180MB of RAM for every page request is now using ~18MB of RAM. That's a win in my book :)

I've been testing the patch on a dev site all morning, and can't find a problem with it (using multiple views on a page, complex views with multiple relationships, filters, etc., and simple content list views).

geerlingguy’s picture

Note that the speedup is even greater on the live site (I just pushed up the patch). Cache clears take about as long, but since I'm storing the cache_views bin in APC, page request execution time is about 240ms.

dawehner’s picture

Version: 7.x-3.x-dev » 8.x-3.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

This patch is really tested by a lot of people, the patch looks perfect so let's get this in.

I would like to get this ported to d8 first and then backported to 6.x-3.x.
For reasons of stability i guess it might be hard to commit this patch to 6.x-2.x after that.

dawehner’s picture

FileSize
2.51 KB

Just wrote a test for this functionality, which seems to fail completly something weird is going on.

dawehner’s picture

Status: Patch (to be ported) » Needs review
FileSize
7.3 KB

Forgot to enable the views test module.

damiankloip’s picture

Status: Needs review » Reviewed & tested by the community

Patch and tests look good to me.

dawehner’s picture

Version: 8.x-3.x-dev » 6.x-3.x-dev
Status: Reviewed & tested by the community » Needs review

Thank you for the review! Committed and pushed

Back to 6.x-3.x

geerlingguy’s picture

Version: 6.x-3.x-dev » 7.x-3.x-dev

Don't forget 7.x!

damiankloip’s picture

Version: 7.x-3.x-dev » 6.x-3.x-dev

This patch was committed first to 7.x I think.

geerlingguy’s picture

jason.fisher’s picture

Slightly off-topic comment that probably belongs elsewhere .. but couldn't we gzip/unzip serialized data prior to storing on the server, if we are storing it in blobs anyway? It's not going to help with PHP memory usage necessarily, but it could help avoid max_packet_allowed errors and reduce database i/o.

jason.fisher’s picture

Issue summary: View changes

Wrote an issue summary

fgm’s picture

This patch does not completely fix the underlying problem: it allows use of partial caches for _views_fetch_data(), but the complete record being saved along the partial ones (in _views_fetch_data_build()) can still quite easily get over 1 MB and fail at being stored in normally configured memcached instances.

dawehner’s picture

Anyone else using this patch in production for 6.x-3.x ? Would love to get this issue fixed.

  • dawehner committed 9f2c94c on 6.x-3.x authored by catch
    Issue #1421844 by catch, swentel, bdragon, dawehner: views_fetch_data()...
dawehner’s picture

Version: 6.x-3.x-dev » 6.x-2.x-dev

@fgm
memcache is now capable of splitting up the entries, if needed.

Decided to commit the patch from catch in #29, given that it is used in production since quite a while.

Note: I just committed the patch against 6.x-3.x but not against 6.x-2.x

Chris Matthews’s picture

Status: Needs review » Closed (outdated)

The Drupal 6 branch is no longer supported, please check with the D6LTS project if you need further support. For more information as to why this issue was closed, please see issue #3030347: Plan to clean process issue queue