Reduce RAM resource consumption

panatlantica - March 26, 2008 - 14:15
Project:Drupal
Version:6.x-dev
Component:update.module
Category:task
Priority:minor
Assigned:Unassigned
Status:active
Description

I have realized that when we install update status on a Drupal (5.x) site that has quite a lot of modules, it consumes A LOT of RAM. It is often likely that the admin page already is not displayed (instead you get a PHP out of memory error) on regular installations. We have one particular site where Update Status runs only if we have memory allocated to PHP (set in php.ini) is even up to 1 GB! Below that value the admin or the update status page itself will produce the "out of memory" effect.

Are there any recommendations on memory consumption of Update Status? What is interesting though is the fact that the Devel module does not seem to see the high demand in memory consumed once you are at a level of 60 and more third-party Drupal modules...

Any suggestions on how to better tune things are highly welcome...

#1

dww - March 26, 2008 - 17:13

Yeah, unfortunately, update_status just has a lot of data to process, especially if you have a ton of modules installed. I don't know what to say. Someone could potentially go through and try to optimize RAM consumption -- there are a few places we might be passing around more data than we need. That said, I'm not really interested in spending much time on this, since this sort of refactoring is very unlikely to get into the version of update.module in D6 core (unless we can show it's a real bug), and I don't want to put effort into the 5.x version that won't benefit the 6.x version.

#2

panatlantica - March 26, 2008 - 21:22

Yep, I completely agree with you on your point here. This is basically why I chose "minor" and "support request" in the sense of: if anyone has done research on that topic by chance could he/she support us with his/her findings ;-)

#3

panatlantica - April 9, 2008 - 08:01

Sorry to continue to bother everybody about it, however, update status can become quite a resource bummer on sites with MANY modules, and by now, there are already quite some complex Drupal applications out there on the web.

So I have been thinking if it wouldn't be an idea to change the way Update Status collects its information in a somewhat more "resource protecting" manner. Instead of trying to pull the status of ALL installed modules while entering the admin page or when running cron or forcing the update of the versions, wouldn't it be an idea to try and strech this process once a certain number of installed modules is present and have the status retrieved in chunks by every cron run. Certainly, this will not allow for an immediate "status-update" (which still could be forced though), but it would allow large sites with MANY modules and lots of functionality to still use Update Status without having to see "Server Error" upon updating the stati of all modules and Cron producing errors (even though nearly all available ressources are given to PHP just to be able to handle the load once Update Status does its job)...

Since Update Status is in core with version 6 I see this as a real advantage for any user with big functionalities on the site and any version of Drupal really.

#4

modctek - April 10, 2008 - 16:57

I don't see myself upgrading several of my Drupal sites to 6 for many months until the needed modules are upgraded, and I absolutely loved the info provided by update_status. Sadly, I can't use it anymore, as it just won't load on some of my more complex sites. 1GB of PHP Memory?!? My hosting provider would hunt me down! I've been incrementally increasing it trying to find out what would be an acceptable amount, and so far even 64MB doesn't do it anymore. Not sure when I hit the tipping point, but by my count, we are using 60+ modules (though some of them are modules like CCK with multiple sub-modules) on our biggest site, and update_status just won't work anymore.

Anyone know of a way to track module updates without having to crawl each module page manually? I like to keep the modules updated to the latest versions if at all possible, and it's just become a nightmare without update_status!

#5

panatlantica - April 20, 2008 - 15:28

Yeah, this is exactly our problem too. It won't be much different on Drupal 6. All is really fine with "regular" and simple Drupal installations, but we start having to increase memory to over 1 GB for PHP on sites that have 30+ modules even.

A method to maybe break down update status retrieval to smaller junks once a certain number of modules has been reached (or alternatively a method to to break up update search into such junks through an additional configuration field would most certainly be helpful for these massive Drupal installations that start to become more common place as Drupal clearly found its way out of being a pure CMS system to be something more of a web application framework!

#6

panatlantica - May 15, 2008 - 07:47

Hi everybody,

I think that some of the problems mentioned here as well as quite a lot of other support requests I found for update status module (which I DO absolutely love!!!) seem to be due to the way update status was built. In other words, the poor module seems to be "broken by design"...

There are a lot of posts, as I said, that speak of errors with this module that, in my humble opinion have to do a) with the way update status stores its information in the database, and as a consequence of which, b) how update data is ultimately acquired.

Currently, update status does not use any own tables in the database which I find somewhat problematic:

It currently stores some info on last update runs etc. in the Variables table. Ok, that's fine.
It seems to store the acquired data on updates in one of Drupals cache tables... muhaaa... why? You clear da cache and gone is the already existing data - not so cool.

With this concept it appears more complicated to "slowly" aggregate data. Currently, all update status information is aggregated at the same time. The module will go through all installed projects and retrieve information right after from drupal.org. If drupal.org is slow or unresponsive we are in trouble. If more then 40 modules are installed we reach a critical resource consumption error.

With Drupal built in D6, this is even more important as more people will probably be using update status from then on!

My proposal thus would be to give update status its own database, separated from the cache table, and introduce a possibility to define how many modules are checked at one cron run at a time (all, 10, 20, 30, 40, 50, ...) to make update status adaptable to varying server environments and resource availability. Also, a mechanism should be in place to give drupal.org a break if it is slow or unresponsive, aka a time-out and repeat later mechanism.

I hope my suggestions are of some help,

Thanks a lot, Steve

#7

dww - May 15, 2008 - 19:30

First of all: #155450: backport separate {cache_update_status} table from D6 core ;)

Secondly, yes, it's a problem that update_status is doing everything all at once, however, precisely because it's in core, it's now very difficult to do anything about that. :( Unless I can convince Gabor that a serious reorganization of the code is necessary to get it to work on sites with more than ~30 modules, we're doomed in D6, since that's now in bug-fix-only mode.

So, sadly, the first place to do anything about this is in D7 core, which means no one will actually see it for ~1 year from now. I'm not going to restructure the D5 contrib module in such a way that isn't going to go into D6 core...

Side note: I bet one of the other problems are modules where the maintainer is irresponsible and generating tons of releases. People who think they should make a new official release after each CVS commit, etc. :( If you happen to be using one of these modules, feel free to submit a bug report to their issue queues, asking them to slow down with the official releases and not be so wasteful.

#8

panatlantica - May 22, 2008 - 15:43

Sigh... that's really a shame that this issue hasn't been addressed any earlier since there were really quite a lot of hints to it in former bug reports by various people...

I do conform with you, quite a lot of people issue new releases with every CVS commit and surely this should also not be best practice.

I found some additional quite interesting article on the matter - http://2bits.com/articles/measuring-memory-consumption-by-drupal-bootstr...

Quite down the comments you'll find some notes on update_status as well.

It really is a big issue that update_status found its way as it is in D6. We are just starting to use D6 for new projects and especially on large projects with tons of modules the task of manually keeping track of new module releases becomes a pure pain without update_status.

Please see if things CAN be fixed yet in Version 6 - otherwise quite a lot of Drupal developers and integrators will really see into skipping Version 6 all together which is sad sad sad...

#9

catch - May 22, 2008 - 17:15

If this is bringing sites down, and there wouldn't be any breakage of external APIs, then I don't see why something couldn't get into D6 since it probably counts as a critical bug. Can we move this to core/D7 maybe?

#10

dww - May 22, 2008 - 17:37

*sigh* ... It's really a shame that more people didn't help test update_status and help move it into core when I was doing all that work (mostly on my own).

Anyway, I sent a note to the devel list to start a new thread about an old topic: http://lists.drupal.org/pipermail/development/2008-May/029947.html

In terms of changing this in D6 core, I guess we'll just have to try to see what Gabor/Dries think, but I'm not terribly optimistic they'll support a big refactoring in a stable version of core.

#11

pwolanin - May 22, 2008 - 20:58

@dww - have not really looked at the internals yet, but is it possible for update.module (or update_status) to fetch the information for one module/package at a time? Also, since you are rarely using this data, perhaps it could be cached in smaller chunks - like per module?

#12

Pasqualle - May 22, 2008 - 21:38

1GB for update_status? I just can't believe that update_status data can be that much.. Can somebody make a proper measurement of memory usage?

#13

kbahey - May 22, 2008 - 22:46

@dww

Would it be possible for the update module to use a table and process things one module/version at a time? Whether it is retrieval from d.o, or checking that against the local?

By moving this from RAM to the database, it may help the situation?

#14

dww - May 23, 2008 - 06:43
Title:PHP Resource Consumption...» Reduce RAM resource consumption
Project:Update Status» Drupal
Version:5.x-2.2» 6.x-dev
Component:Code» update.module
Category:support request» task
Priority:minor» normal

- This isn't a minor support request, this is definitely a task that should be done, and perhaps considered critical.

- The most important question is can this be solved in D6 core or not?

- I don't believe 1 gig for update status alone. I think the problem is on sites where there are tons of modules already, bloating RAM usage on their own, and update_status just pushes it over the top. That said, I would like to see real numbers to know exactly what we're dealing with to decide if this is normal or critical.

- I'm not convinced that just caching per module will actually help total RAM consumption if that's all we change. I'm sure that "By moving this from RAM to the database, it may help the situation?" is fantasy. The data is only useful to update status once it's in RAM and being processed. The fact we store it in the DB is meaningless unless we load it into RAM to compute the status. And, all the data is already in the DB, since we catch everything we fetch, and everything we compute about the current status.

That said, if we split up the functions and data structures so that we do everything on a per-project granularity (not per module, btw), throw out data we don't need, and try to avoid processing multiple projects in the same page load, we might be able to greatly reduce the footprint. We'd have to:
A) Fetch and parse the XML release data only a project or two at a time, ideally via batch processing.
B) Try to throw out as much of the data as possible (e.g. all releases older than the currently installed release, fields included in the XML we don't actually care about for update status (if any), etc).
C) Compute the status for each project separately, also via batches.
D) Once we compute the status for a given project, cache just the status info in the DB, and only load that in update_requirements() and on the update status report.
E) Aggressively clear this cache of the current status for a project if any of the .info files change (pretty sure we already do this, but we should make sure it's still happening after whatever other changes we might make).

If we did all that, I think we'd be in much better shape to avoid running out of RAM on sites with tons of modules and/or projects that make way too many releases. The point is not that the data lives in the DB instead of RAM, it's that we ditch as much of the data as possible, and try to only process a subset of the data on any given page load.

I fear Gabor/Dries will refuse such a patch in stable core, unless there's clear evidence that this is a serious bug in the current design. But, it's going to require API changes (at least in the internal API of update.module itself), and those almost never change during a stable series unless required to plug a security hole. As an abuse of this exception, we could try to claim that sites disabling update.module due to the RAM requirements would be a security liability for Drupal, but I doubt that's going to fly. ;)

We need:

1) Documented, reliable RAM usage profiling of update(_status).module on various sites, showing the number of modules enabled, the RAM usage before update(_status)? and the RAM usage after enabling it.

2) Comment by Dries/Gabor about the feasibility of changing any of this in D6.

If other people can wrangle these, I'm willing to do the refactoring.

#15

Pasqualle - May 23, 2008 - 08:28

I would rather hack core than disable the update_status module.

#16

JohnAlbin - May 23, 2008 - 16:26

subscribing so I can read this in more detail later.

#17

Crell - May 24, 2008 - 01:52
Priority:normal» critical

Given that I have a not-really-interesting D6 site that takes up 40 MB on an alarming basis, it looks like D6 in general has some memory issues to work out. I am therefore going to bump this up to critical. I agree that we do need some better profiling to figure out exactly why D6 is so memory hungry, especially after we chopped out 25% of the memory usage via code refactoring already.

#18

KingMoore - May 27, 2008 - 00:41

Hello,
Is there an easy ways to measure the resources consumed by various modules?

#19

wretched sinner... - May 28, 2008 - 01:00

subscribing to help test patches

#20

BioALIEN - June 10, 2008 - 11:59

We maintain a few very large sites (definitely 30+ modules). But they're running fine with a PHP memory limit of 64MB with a few LAMP tweaks which any average sysadmin should be able to carry out did the trick for us. In fact, we've grouped over 10 of these sites onto one box and it's running fine after the initial problem was discovered.

I don't believe for a second that allocating 1GIG to PHP is a smart and common sense thing to do. Have you evaluated other alternatives like an op code cache, optimised apache and the database? Some OS (FreeBSD for example) use different memory management so try switching your setup.

While I am fully supportive of making core as resource friendly as possible out of the box, I agree with dww's concerns. This issue could be blown out of proportion by a few bad modules that have been installed. I guess we need some tests to identify where this excess load is coming from.

@KingMoore: I believe there is a project by a SoC student that returns this info.

#21

Gábor Hojtsy - June 2, 2008 - 19:06

Agreed with Crell in #17. If update_status is the culprit, it should be showed, then fixed in D7 (show benchmarks that it is fixed) and hopefully backported if possible with a reasonable fix. Looks like there would be enough interest, so it would not be dww only to take this on, right?

#22

modctek - June 3, 2008 - 17:10

Unfortunately, some of us are running on shared environments where our options to change setups/tweak server settings are very limited. And I'd still very much like to have something I can use on the 5.x platform until I can reasonably migrate our site to 6.x

#23

panatlantica - June 4, 2008 - 13:06

Some feedback - Update Status in Drupal 6:
It really appears to be a problem more prone to happen in Drupal 5.x then in Drupal 6.x -- we are currently building the first larger site with D6 (including the core Update Status module as well as the Update Advanced module for more control). This site is hosted as our other sites on a dedicated server with quite some hardware resources running Debian Etch and a FastCGI setup for better performance on high loads.

In this current installation we've got 45 modules installed. They are about (more or less) the same modules (just for Drupal 6.2) then on the other machine, VERY similar setup, also 45 modules, Drupal 5.7.

It is interesting to note that the resource problem (aka out-or-memory error) does NOT occur on the Drupal 6.x setup but only on the Drupal 5.x setup.

Nonetheless, I think some design enhancements in the code could still be beneficial for the D7 version of Update Status, but right now, I could not replicate the actual problem we were talking about in D6 so far.

#24

dww - June 5, 2008 - 01:33

Not that there are any hard numbers to go on in #23, but that supports my theory that part of the problem is the huge number of releases. In addition to some shared per-project data, each release of a module adds ~600 bytes to the XML file. I have no idea how much RAM the XML parser chews up when parsing that additional 600 bytes, but lets say pessimistically between holding the data itself, overhead from the XML parser, and update_status internal data structures, we're talking 1KB per release. As a general rule, D6 modules don't have nearly as many official releases as their corresponding D5 versions do.

<speculation class="wild">So, 45 modules in D6 might be ~100 total releases, whereas 45 modules in D5 might be ~400-800 releases. So, that could be roughly on the order of 500KB additional RAM, just from the extra releases.</speculation> ;) 1/2 meg on its own certainly doesn't explain what people are talking about here. Again, I'm guessing that in the D5 case (without all the RAM reduction benefits of module splitting in D6), that the site is hovering close to the RAM limit as it is, and with D6, it's back under the limit.

However, all this is guess-work. What we most need is someone with a good RAM profiling setup to actually drill into this and come up with some hard data for various configurations.

#25

pwolanin - June 10, 2008 - 13:05

Ok, well then perhaps we can optimize by just throwing away information about old releases?

#26

Crell - June 10, 2008 - 15:34

I'd say a mixture of throwing out old release info we don't need and splitting up the cache to separate entries would go a long way toward reducing memory consumption. While that would increase the number of cache requests necessary, I don't think it's checked often enough or on enough pages that it would make a difference in normal operation.

#27

aclight - June 13, 2008 - 22:22

subscribing

#28

KingMoore - June 14, 2008 - 04:27

Why not let d.o provide this as a web service?

I have had a bit of a think about this over the week. Wouldn't it make a whole lot of sense if Drupal.org offered a web service where you could simply send d.o a module name and it would return to you the releases in XML format? Since Update Status is now in Core, the number of sites crawling d.o for releases is only going to go up. A dedicated service seems to be the way to go, and Update Status could then be patched to get it's data in this way, rather than the curent method. This should significantly lessen the load on both d.o and also the clients running the Update Status module.

#29

webchick - June 14, 2008 - 03:45

@KingMoore: Er. That's exactly what we have now?

http://updates.drupal.org/release-history/drupal/6.x

@panatlantic: Is there any way you could put a temporary hack in Update Status module's code to log the size of the XML files / current ram usage in both 5.x and 6.x to compare?

#30

KingMoore - June 14, 2008 - 04:28

lol I thought it was crawling the d.o pages. Maybe that was D5. Doh. Glad to see this was implemented before I thought of it :D

#31

petey318 - June 25, 2008 - 06:44

subscribing - I'm running a number of D5.7 sites and just fell over the edge with one of them, which means that I will need to disable update_module on that site. That site runs approx 35 modules.

I think the idea of having a parameter to allow checking of [all, 10,20, whatever] modules at a time is an excellent one.

Best regards
Pete

#32

petey318 - June 30, 2008 - 22:56

Forgot to mention that all my D5 sites are running on shared hosts, so, as with #22, I am unable to make any parameter changes. So for now, I have had to turn off update status on all my sites, because it is unusable.

I don't know much about D6 yet, but if the behavior is the same (and is enabled by default) then I would think that this has the potential to kill D6 on shared hosts...

#33

MichaelK - July 9, 2008 - 19:43

subscribe

#34

pwolanin - July 10, 2008 - 01:15

@dww - looking at the information sent by d.o - it's sending information about all releases - is that needed? On the d.o side can we only build information about the most recent official and -dev release for each branch?

#35

dww - July 10, 2008 - 06:29

@pwolanin: No, because we don't know what version they're running and what version(s) are security updates. Trivial case:

5.x-1.4 is latest
5.x-1.3 was a security update

if they're running 5.x-1.3, we need to tell them to upgrade to 5.x-1.4 because they're out of date.
if they're running 5.x-1.2, we need to tell them to upgrade to 5.x-1.4 because they're missing a security update.

it gets even more complicated with:

a) different branches/major versions
b) releases marked as unsupported
...

:(

#36

pwolanin - July 10, 2008 - 14:46

@dww - hmm, so we need some of this history, but maybe only the most recent version plus the most recent security fix?

#37

dww - July 10, 2008 - 16:50

@pwolanin: And the info about *any* releases that are marked as unsupported. And, we have to include both the most recent (could be a beta/rc), and the most recent "real" release. And all this info for every available branch (most recent, most recent security release), etc. And, we'd have to double-check that existing update_status clients won't start freaking out with bogus results if it only sees this data. In fact, I'm pretty sure we're screwed there, since if it can't find the currently installed release in the info about what's available, it thinks that's really bad and generates some kind of warning. So, we'd have to either provide the slimmed data at a new location that newer versions of the clients know where to fetch it, or give up on this line of attack.

Plus, it's at least possible other people are using this data for other things, and they'll be very sorry if all of a sudden, most of the data just disappears. That'd be another reason to provide the full data in the current location and the slimmed data somewhere else.

Of course, we probably can/should change things in the 7.x version (using the PHP5 XML parser might make it easier to use a more fancy XML schema, we might want to include less/different info, etc, etc) -- it'd be (relatively) easy to have the history generating scripts include different info in a different format in the .../7.x/* files it generates, and have the update.module in 7.x core expect and handle different data.

But, since this issue is about 6.x (and 5.x contrib), I don't really think it's viable to change the format of the data at the server.

Sorry,
-Derek

#38

pwolanin - July 10, 2008 - 16:56

@dww - if it's not viable, that's totally fine - I just want to consider the possibility. Though each module sends its version information, right? So a site with a 5.x-1.5 module doesn't need to know anything about versions before 1.5, right?

#39

dww - July 10, 2008 - 17:00

Yeah, I appreciate the attempt to explore other options, that's great...

Though each module sends its version information, right?

Not as of update_status 5.x-2.* and beyond. Each site only fetches data from a URL that includes the project name and the version of core. Sites never send specific info about what versions they're using. I could dredge up the issue numbers where all of this was originally discussed and decided if you want to read the history. ;)

#40

pwolanin - July 10, 2008 - 18:21

Really? Looking at this code it seems like the version is included: http://api.drupal.org/api/function/_update_build_fetch_url/6

and if I hack that function to drupal_set_message() each url, I get values like:

  • http://updates.drupal.org/release-history/drupal/6.x?site_key=1789&version=6.4-dev
  • http://updates.drupal.org/release-history/node_clone/6.x?site_key=1789&version=HEAD
  • http://updates.drupal.org/release-history/cvs_deploy/6.x?site_key=1789&version=6.x-1.0

#41

dww - July 24, 2008 - 07:38

Gah, I totally forgot we added that for tracking usage stats. ;) Tee hee.

#42

pwolanin - July 24, 2008 - 12:21

@dww - ok! Does the 5.x version work the same way?

Above you also talked about changing the internal way that update module processes data to keep less in memory. Which do you think is the faster/simpler fix?

#43

dww - July 24, 2008 - 15:52

There were all kinds of server-side performance issues with putting too many brains in the server to send the client only the info it needed to know. That's what lead to the complete redesign for 5.x-2.* and the move to the XML files. I think it'd be better to just make the clients smarter about not saving/caching all the extra data, than reopening the can of worms about how updates.d.o serves up data in the first place.

#44

pwolanin - July 25, 2008 - 02:23

@dww - well, we should certainly look into the latter, but can't we just pre-build an XML file for each module/core version - or smarter, perhaps, build it for the most recent 2 or 3 releases of each project, and have the PHP wrapper that servers the XML files just fall back to the big (current) file if you have a version it can't find?

#45

dww - August 2, 2008 - 18:36

FYI: he didn't post it here, but chx did some memory profiling last night. Even with a D6 site with modules with the most releases (I queried the DB to help him with a list of modules sorted by the most official releases -- he had something like 11 modules and 1 theme that between them had nearly 200 releases), he discovered that update.module was a drop in the bucket of RAM consumption relative to other parts of D6 such as the theme layer, the menu system, and views. See #259479-34: Views needs a lot of memory for his initial findings. Hopefully he'll post here with more of the update.module details he discovered (he was just pasting various results into IRC). So, I'm not quite as worried about this issue as I used to be. There's still room for improvement, and maybe even still in D6, but I'm not convinced this is a "the sky is falling" kind of problem anymore...

#46

pwolanin - August 2, 2008 - 19:35

@dww - well, it seems like comments above suggest the module could not be used for D5 sites in some cases - since we are potentially looking at the number of releases for D6 modules increasing a lot over the coming moths, I'm still wondering if we need to tackle this.

#47

dww - August 2, 2008 - 21:05

Yeah, I left it "critical". I still think it's worth fixing. But, I don't believe it's quite as dire as I initially feared, that's all. I still believe the basic plan I outlined in #14 is what needs to happen. We can consider other ways to change the data format and optimize in D7, but for D6 (and perhaps D5), we need something pretty close to #14, and see how much that helps.

BTW, for testing, it'd be fairly easy to generate some bogus release history files with 100s of releases per project, and point a test site at those (since you can specify the location of your release history in the .info file for each project). So, that'd be another way to test in D6 without waiting for lots of real releases.

#48

chx - August 4, 2008 - 11:28
Priority:critical» minor

I did a number of investigations and the memory that update status consumes is insignificant compared to the memory mere code inclusion eats.

#49

Gábor Hojtsy - August 4, 2008 - 11:31

chx: did your investigations include a considerable number of contrib modules used on the site, and/or modules with lots of releases?

#50

pwolanin - August 4, 2008 - 12:25

@chx - I'm a little surprised, considering that dww was saying there might be 400k sent per project (x 20 projects? on a big site)

#51

chx - August 4, 2008 - 14:45

Yes, there were no less than 178 releases. The total fetched XML is 105 767 bytes and parsing it takes 1 383 960 bytes. The site eats a whopping 25M on the empty homepage and 35M on admin/build/modules. That 1MB is nothing... I enabled all core modules and every module that let itself enable from the projects cck contemplate countdowntimer dhtml_menu everyblog flashvideo og scheduler views weather weblinks and added contrib theme fourseasons.

 
 

Drupal is a registered trademark of Dries Buytaert.