Download & Extend

Memory usage and i/o from config objects

Project:Configuration management initiative
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

I started this as a comment on http://groups.drupal.org/node/155559, but chx suggested it'd be better as an issue.

If I was writing it as an issue I'd do it differently, but since this is already written, posting as is. This is more about some of the assumptions of the groups post rather than the overall architecture, in general I like the architecture, but by itself the fact that there are config objects and they're lazy loaded is not necessarily going to solve the current performance issues compared to the variable cache - we may need to add a layer in between to make that slimmer on both counts.

The advantage of storing configuration this way is that rather than the current system where the entire contents of tha variable table is loaded into memory on every page request, thus limiting the data that can be stored there, we can instead target loading of only specific parts of configuration we need for the current request.

The main reason the full contents of the variables table is loaded on every request is because we can't distinguish between data that is set to the default, and data that is not set at all with the variable API. There is not really that much stopping it from individually reading variables from the db except for performance - but the fact it's all stored in one table is not the reason it is such a bad fit for stuffing lots of data into, there are multiple reasons that it's bad as it is currently. With files there is a good reason to split them up (no row-level locking on files), but if they are only read from when being synced with the active data store or diffed, then this really applies to the active store rather than the file store no?

That follows on to this:

inally, go to admin/configuration/system/config or run drush config-update (note: final location/command likely different; making these up right now). This will outline the differences between on-disk and the active store. If the changes seem good, go ahead and confirm to overwrite the content of the active store with the on-disk configuration. The admin tool will offer to regenerate the file signatures if necessary.

chx said in irc that this will be a list of config object keys, and then individual values will be diffed, if this is the case then that sounds fine to me, I would be concerned about a rebuild process that requires loading the full contents of the file store + active store into memory and diffing them.

While at first glance the structure looks like the "variables" table in Drupal 7 and below the fundamental difference is this table stores configuration objects (say, every site information setting: name, mission etc) the variable stored single values (like the site name). Also, as said above, we are not loading the whole table in memory.
All site configuration data gets read out of this wrapper.

Not loading the entire table into memory goes without saying, but there are other concerns here:

Unless I'm misreading, every 'default' value for variables will be stored here, not just those overriden from defaults. So if you have a large site config object, just to get the site name, you're going to need to load the full site_information object into memory each request - which on many sites has null values for mission statement, 404 page etc. as well as values like site e-mail address which are not usually needed each request.

A few different large-ish config objects with all their default properties could end up outweighing the memory savings of not loading the variables table into memory each request.

During a standard D7 page request with no contrib modules installed, there are something like 60 variables requested, these would be from multiple different configuration objects. Let's say 10 configuration objects with 6 variables each (although the actual object may encapsulate 10, 15, 20 different properties which are also going to be loaded).
Core currently writes around 100-200 variables to the db if you install and click around a bit.

So that's an exchange of one big cache item with an array of between 100-200 variables, to maybe 10 database queries which will be objects with 10-20 properties each (some NULL or default strings), there is not necessarily going to be a memory trade off there in practice and there is definitely going to be a cost in terms of database i/o.

With a lot of modules enabled on an older site, D6/7 can easily get to 1-2,000 variables stored in the database - let's say 50 installed modules have one variable they check on every request from their own config object, reading an object with 10-20 variables for each of those 50, and it's in the same range even with the lazy loading.

I'm hopeful that the CacheArrayObject stuff I'm working on would provide a way to reduce the memory footprint of both the current variables table in D7, and it may apply to what goes on here too (for both memory and reducing overall i/o for reads). It's very much going to depend on what get stores where as to how much of an improvement there might be (or not) with the new system, but I don't think it can be taken as a given that it's going to be better from a memory and i/o standpoint if the default behaviour is to load a full config object with all declared variables when one is requested, there may need to be an extra layer to smooth things out.

Comments

#1

n * namelen * size / 1024 / 1024 ~= size (MB)

Let's say: 2,000 variables, name len avg ~= 20, size avg ~= 50 (a lot, for the sake of the sample), this is less than 2MB total, not that much (2,000 is really a lot, 1,000 on the sites I usually work with is almost the maximum I get).

I'd say, doing 10 SQL queries to load the lot partially, but end up with 90% of variables loaded in the end will be a performance killer instead of being a real performance gain.

Storing 10 variables: 10 * 20 * 50 = 10,000 bytes is probably less than the core and PHP stack will need to only create and run the single SQL query in order to fetch the data. Storing smaller caches why not, but storing incredibly small caches is a huge error.

#2

The site I usually check for this example has over 2,200 variables stored, this is after a couple of culls of stale variables. In xhprof this comes in at 1.6mb.

Most sites tend to get to more like 1,000-1,500 plus once they've been running for a long time - so closer to 1mb or less.

However, the idea of this system is to take over other configuration - fields, instances, system table, possibly things like node types - all sorts of stuff could be moved to this (views?) - in that case, you absolutely can't be loading everything in one go, but there is still a trade-off (and even stuff that used to be in variables is going to be split up so the argument applies for a straight conversion as well as when you bring other things into it).

#3

I think the development of a new variable system should start with a single cache entry, for the sake of simplicity, and may be cut later after some benchmarking. I'd tend to think that, over all stuff we could legitimately remove from memory, variables is probably the last one we would want to remove for performance reasons.

#4

I would agree with one cache entry during any one request, but I would want to seriously look at caching based on content type, request method, and possibly very low level context like path root ('node', 'admin' etc.).

There's also the issue of whether the config system has to be initialized to serve cached pages or not, in core at the moment you can bypass variable_init() and hook_boot() - so the only required hit to cache is the actual cache_get() for the page itself and the database may not be initialized at all. It would be a shame to lose that.

#5

You are true, but you can still use a partial configuration before actually loading the cache entry. Meaning that, exactly like now, continue to handle to a incomplete $conf array, then load the cache when the necessay systems are up.

nobody click here