Problem/Motivation
views_fetch_data get the views table information. To not run all the hooks all the time there is a cache in
the database with the cid 'views_data' , which is loaded on every page with a view.
The problem is that the result is getting really big really fast, so on some sites
it is over 10MB.
Proposed resolution
Most of the time you don't have to load all the information, but just the information from one certain table.
If the $table parameter is specified just load the data from a cache specific for the single table.
At the same time write one additional cache entries for each tables, but keep the one for all
tables.
Remaining tasks
- General documentation of the new code
- If it is possible tests.
Original report by [username]
I'm working on a site with dozens of views and a need to scale to hundreds of near-simultaneous authenticated users. One of the memory hogs we're seeing in xhProf is the cache object loaded by views_fetch_data()
and views_cache_get(‘views_data:en’)
, which seem to run every time any view is loaded on the page. That cache object (loaded from memcache) takes up 11 MB per request, so at the scale we want to reach, that's multiple gigabytes just for that item.
... some more, just asking why this is done.
Comment | File | Size | Author |
---|---|---|---|
#33 | views-1421844-33.patch | 7.3 KB | dawehner |
#32 | views-1421844-32.patch | 2.51 KB | dawehner |
#26 | 1421844_views_fetch_data_26.patch | 4.66 KB | catch |
#26 | 1421844-6x-3x-do_not_test.patch | 2.66 KB | catch |
#26 | 1421844-6x-2x-do_not_test.patch | 2.54 KB | catch |
Comments
Comment #1
merlinofchaos CreditAttribution: merlinofchaos commentedThis is unfortunately normal. For a view to be built, Views has to know all of the data it might pull from, and it stores it as one big bucket -- really it's the result of hook_views_data().
This is a known issue with Views but there's really no way to change this without some pretty significant work and most of the attempts to save memory would come with increases in cache loading overhead and ultimately might not end up saving that much.
Comment #2
dawehnerIn php5 we could do some kind of pseudo array object, which lazy-loads the needed data per table.
To be able to do this the cache entries would have to be per base table.
The problem though is that if you have a view with many different tables / fieldapi fields, you end up with many cache_get's which is also bad, if not worse.
Comment #3
dawehnerJust had another idea: Store the views data per view.
It should be probably possible to find out all fields of views_data which are needed for a certain view.
There are probably problems with things like very custom filters/handlers, but maybe some kind of api could be exposed.
Based on this a cache entry per view would be stored, which should be MUCH smaller then a full one.
Maybe these two ideas could be combined as well.
In general i doubt that such kind of work will happen on 2.x...
Comment #4
swentel CreditAttribution: swentel commentedI've been experiencing the same exact issue with a site I was profiling for heavy memory usage. views_fetch_data() was taking up 15mb per request, sometimes just to show only one single view. I've attached a patch which changes the _views_fetch_data() function storing every data per table in a single cid and an extra cid called 'base_tables', only storing info about the base tables. The patch now only loads the views data for the view it needs on that request. While this seems overhead, we can (safely?) assume that in case you have much views (cf underneath), you won't probably put them all on one page. If so, then I'd rather think there's a structural site building problem. And with memcache you can also make your database server even more happier. And with the patch, everything seems to be working still nicely.
Context of the site:
- 26 views, 37 displays
- 73 fields used in views
- 138 fields
- 219 field instances
- 26 content types
The memory drop underneath is done on a page which loads 3 views every single request (it's looking for nid in the url to create 'related' info blocks).
Total memory before patch: 34.65 MB, PHP peak=35.5 MB.
Total memory after patch: 21.87 MB, PHP peak=22.5 MB.
Here are the xhprof screens.
Before
After
Let me know what you guys think. I haven't tested all possibilities yet, hope the testbot is happy as well :)
Comment #6
swentel CreditAttribution: swentel commentedOk, so it looks like there should be one cache entry with all data for the wizards, will look further tomorrow.
Comment #7
swentel CreditAttribution: swentel commentedOk, this patch should fix all tests.
Important: Views UI needs to be disabled to fully profit from this patch. Haven't traced it completely, but it looks like when looking at a view (eg frontpage), views needs all data, looks like it's loaded somewhere in admin.inc. Haven't run new xhprof tests, but it should be even a little bit better.
Comment #8
swentel CreditAttribution: swentel commentedOk, great, however, this won't work all the time. I'm looking at a page now managed by page manager where all data is asked. Going to trace and see if we might need additional cid's in the database or not with only the info that is needed for a particular function.
Comment #9
swentel CreditAttribution: swentel commentedTraced it: the full request was coming from entity_views_plugins() from entity module which simply wants to know whether a table had an 'entity type' key and 'base key'. In this particular site, this would have saved me 4MB of cache, now entity module is causing 4 MB of cache for not a lot of data it wants (this could easily be a simple variable which could be cleared on full cache or so (it already implements hook_flush_caches so it could do that).
Comment #10
tim.plunkettSo it seems that anyone calling
views_fetch_data()
with no key/table should wrap that themselves?Grepping through my checkouts of D7 modules, the following modules are doing that:
data:
data_get_views_handler_options()
entity:
entity_views_plugins()
entityreference_view_widget:
entityreference_view_widget_views_plugins()
search_api:
search_api_views_update_7101()
searchlight:
SearchlightDatasource::getDatatype()
sparql_views:
sparql_views_plugin_query_sparql::init()
views_arguments_extras:
views_sort_by_arg_order_handler_sort::options_form()
7 out of 3032 modules, one of which is an update hook, makes it seem to me like it should be their problem, not ours.
As far as the patch itself, it makes sense, but should the function signature of
views_fetch_data()
change to match?Comment #11
tim.plunkettCNW to make sure that the function signatures of
views_fetch_data()
and_views_fetch_data()
match.Comment #12
bdragon CreditAttribution: bdragon commentedOpened #1558340: Scalability: Cache base table list to avoid calls to views_fetch_data() to address entity's calling of views_fetch_data().
I was having corruption problems when exporting views using drush. Missing handlers, etc.
I rewrote the patch in a somewhat naive manner (duplicate code blocks that I'm sure could be merged together, but I was just trying to get it to work properly.)
My version does not use 'all' as the all tables entry, it uses the views_data: entry like the original code.
Setting to needs review for code review, but it probably won't be accepted without reworking a bit to not have duplicated lines of code.
Comment #13
bdragon CreditAttribution: bdragon commentedI'm still having trouble with entity. :(
I guess it could be a recursion thing.
Comment #14
bdragon CreditAttribution: bdragon commentedOK, was silly, I wasn't keeping track of whether the cache was fully loaded or not.
Comment #16
bdragon CreditAttribution: bdragon commentedUpdating tests.
Comment #18
catchMoving this to a bug report, I'm seeing this cache item take around 8mb on a Drupal 6 site I'm profiling at the moment.
Items bigger than 2mb potentially fall foul of max allowed packet or memcache default slab size (which is actually 1mb, but the memcache contrib project compresses large items so in practice you can get away with much bigger).
Comment #19
catchOK the test update was right, but the test that was failing wasn't using that helper which is why it still failed. That particular test now passes for me with this patch, didn't run the full suite locally.
Comment #20
catchStarted to look at a backport then realised there's a fair bit of code duplication between the if ($table)/else { code paths, moved that to a helper.
Comment #21
catchNon-empty patch.
Comment #23
catchOK not quite, this should fix the test failure.
Comment #24
catchHere's 6.x-3.x and 6.x-2.x backports, untested as yet.
Comment #25
catchSorry, better 2.x backport that actually applies..
Comment #26
catchOK one more change. I noticed the individual cache entries are getting set before the bigger one. This would mean a potentially long window where specific tables aren't cached at all, so it'd be easy to end up with a cache stampede.
For now I put the large cache entry saving first, so at least if an individual item isn't set yet, the larger one can be picked up. To avoid setting such a large number of items all at once, it'd be necessary to use DrupalCacheArray or a custom on-demand mechanism like #853864: views_get_default_view() - race conditions and memory usage does, however I'm unlikely to have time to work on that at least this week.
Also found a couple of bugs in the 6.x patches while testing, so uploading new versions of all three. That should be the end of spamming this issue for a bit hopefully.
Comment #27
fenda CreditAttribution: fenda commentedI posted this comment on this issue too: #853864: views_get_default_view() - race conditions and memory usage
Might be related to both.
Comment #28
dawehnerTo understand the issue 100% i wrote an issue summary, please complete it if there is something missing.
Comment #29
geerlingguy CreditAttribution: geerlingguy commentedSome extremely encouraging testing results (avg. of 3 runs):
Before Patch
54 rows in
{cache_views}
; 5.5MB total data sizeFlush caches page load: 33871ms, 185.33MB peak RAM
Cached page load: 681ms, 34MB peak RAM
After Patch
745 rows in
{cache_views}
; 11.5MB total data sizeFlush caches page load: 38089ms, 185.5MB peak RAM
Cached page load: 581ms, 17.5MB peak RAM
Differences
Cache clear: +4218ms execution time, +.17MB memory
Cached page load: -100ms execution time, -16.5MB memory
So, for normal use, there's a 15% improvement in execution time, and 49% improvement in memory usage.
Between this patch and the one in #1040790: _field_info_collate_fields() memory usage, a site with a bunch of views, entity bundles, and fields that used to use ~180MB of RAM for every page request is now using ~18MB of RAM. That's a win in my book :)
I've been testing the patch on a dev site all morning, and can't find a problem with it (using multiple views on a page, complex views with multiple relationships, filters, etc., and simple content list views).
Comment #30
geerlingguy CreditAttribution: geerlingguy commentedNote that the speedup is even greater on the live site (I just pushed up the patch). Cache clears take about as long, but since I'm storing the cache_views bin in APC, page request execution time is about 240ms.
Comment #31
dawehnerThis patch is really tested by a lot of people, the patch looks perfect so let's get this in.
I would like to get this ported to d8 first and then backported to 6.x-3.x.
For reasons of stability i guess it might be hard to commit this patch to 6.x-2.x after that.
Comment #32
dawehnerJust wrote a test for this functionality, which seems to fail completly something weird is going on.
Comment #33
dawehnerForgot to enable the views test module.
Comment #34
damiankloip CreditAttribution: damiankloip commentedPatch and tests look good to me.
Comment #35
dawehnerThank you for the review! Committed and pushed
Back to 6.x-3.x
Comment #36
geerlingguy CreditAttribution: geerlingguy commentedDon't forget 7.x!
Comment #37
damiankloip CreditAttribution: damiankloip commentedThis patch was committed first to 7.x I think.
Comment #38
geerlingguy CreditAttribution: geerlingguy commentedAh, I see: http://drupalcode.org/project/views.git/commit/0ba1a86
Well, thanks!
Comment #39
jason.fisher CreditAttribution: jason.fisher commentedSlightly off-topic comment that probably belongs elsewhere .. but couldn't we gzip/unzip serialized data prior to storing on the server, if we are storing it in blobs anyway? It's not going to help with PHP memory usage necessarily, but it could help avoid max_packet_allowed errors and reduce database i/o.
Comment #39.0
jason.fisher CreditAttribution: jason.fisher commentedWrote an issue summary
Comment #40
fgmThis patch does not completely fix the underlying problem: it allows use of partial caches for
_views_fetch_data()
, but the complete record being saved along the partial ones (in_views_fetch_data_build()
) can still quite easily get over 1 MB and fail at being stored in normally configured memcached instances.Comment #41
dawehnerAnyone else using this patch in production for 6.x-3.x ? Would love to get this issue fixed.
Comment #43
dawehner@fgm
memcache is now capable of splitting up the entries, if needed.
Decided to commit the patch from catch in #29, given that it is used in production since quite a while.
Note: I just committed the patch against 6.x-3.x but not against 6.x-2.x
Comment #44
Chris Matthews CreditAttribution: Chris Matthews as a volunteer commentedThe Drupal 6 branch is no longer supported, please check with the D6LTS project if you need further support. For more information as to why this issue was closed, please see issue #3030347: Plan to clean process issue queue