I noticed that my cron hadn't succssfully run in a couple of days to I started looking for an explanation.

When I checked the error log, I found this: [Fri Jun 19 11:31:06 2009] [error] [client 127.0.0.1] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 24 bytes) in /var/www/html/connect/includes/database.mysqli.inc on line 144

When I disabled the XML Sitemap modules, the error went away and cron started completing successfully.

Let me know if there's any other information I can provide.

Version: XML sitemap 6.x-2.x-dev (2009-Jun-14)

Enabled Modules: XML sitemap, XML sitemap engines, and XML sitemap node

Comments

dave reid’s picture

Going over 128MB, well crap. Could you let me know the result of the following SQL queries and questions?

SELECT language, COUNT(pid) as pid_count FROM url_alias GROUP BY language
SELECT type, COUNT(id) as id_count FROM xmlsitemap WHERE status = 1 GROUP BY type
SELECT COUNT(n.nid) FROM node n LEFT JOIN xmlsitemap x ON x.type = 'node' AND n.nid = x.id WHERE (x.id IS NULL OR x.status IS NULL) ORDER BY n.changed DESC

- Are you generating the sitemap for more than one language?
- What's your minimum sitemap lifetime?

kmillecam’s picture

We did recently enable language support but I'm only generating sitemaps in English.
Minimum sitemap lifetime is set to "no minimum".
Number of links in each sitemap page, "1000".
Maximum number of sitemap links to process at once, "100".

mysql> SELECT language, COUNT(pid) as pid_count FROM url_alias GROUP BY language;
+----------+-----------+
| language | pid_count |
+----------+-----------+
|          |    560225 | 
| en       |      3462 | 
+----------+-----------+
2 rows in set (0.37 sec)

mysql> SELECT type, COUNT(id) as id_count FROM xmlsitemap WHERE status = 1 GROUP BY type;
+-----------+----------+
| type      | id_count |
+-----------+----------+
| frontpage |        1 | 
| node      |    89230 | 
+-----------+----------+
2 rows in set (0.10 sec)

mysql> SELECT COUNT(n.nid) FROM node n LEFT JOIN xmlsitemap x ON x.type = 'node' AND n.nid = x.id WHERE (x.id IS NULL OR x.status IS NULL) ORDER BY n.changed DESC;
+--------------+
| COUNT(n.nid) |
+--------------+
|           26 | 
+--------------+
1 row in set (0.16 sec)
dave reid’s picture

Would it be possible for you to temporarily adjust your PHP's memory limit up to 180MB and then test if the cron completes successfully? Or if you want to feel really helpful and adjust down from 180MB in 10MB increments to find where the limit error occurs again.

kmillecam’s picture

Sure Dave,

It might be early next week before I can pull it off but I'll do some testing and report back.

kmillecam’s picture

Hi Dave,

Jeremy Andrews from Tag1 Consulting did some testing. (I'll include the results below).

We ended up increasing our available memory to 154M but this wasn't entirely because of XML Sitemap. It appears that we ran into memory problems only when XML Sitemap and Notifications tried to perform tasks during the cron run.

******************

I've run cron quite a few times -- the amount that each module needs can fluctuate quite a bit, so it wouldn't be surprising for one module to need more than normal and cause xmlsitemap to fail. But the next time cron runs I would expect it to succeed.

One useful piece of information gleaned from this is how much RAM is being consumed when xmlsitemap runs and rebuilds everything: ~108M.

Here's another run where it ran out of memory because notifications fired off at the same time:

16:41:55: before (5,673.695 K) calling notifications_cron
16:42:12: after (28,676.695 K)
...
16:42:54: before (43,745.867 K) calling xmlsitemap_cron
Out of memory.

I ran it again without any notifications firing, and it completed fine. This also shows that the xmlsitemap_engine module is not the issue:

16:45:04: before (18,595.578 K) calling xmlsitemap_cron
16:45:20: after (129,496.797 K)
16:45:20: before (129,497.141 K) calling xmlsitemap_engines_cron
16:45:22: after (129,517.250 K)
16:45:22: before (129,517.250 K) calling xmlsitemap_node_cron
16:45:22: after (129,407.430 K)

Something else of interest. The module breaks things into "chunks". I had assumed that each chunk would take roughly the same amount of RAM, but that's not the case:

16:48:02: before (17,776.477 K) calling xmlsitemap_cron
0 of 91, using 17,947.516 K
1 of 91, using 17,948.141 K
2 of 91, using 128,705.234 K
3 of 91, using 128,705.430 K
4 of 91, using 128,705.859 K
...
89 of 91, using 128,722.492 K
90 of 91, using 128,722.688 K
16:48:09: after (128,725.813 K)

These numbers show the memory usage before calling xmlsitemap_generate(), so it was the second chunk "#1" that used all the memory.

Based on this information, we'll need to set php.ini to just above 150M for it to always run successfully.

dave reid’s picture

I had been working on a kind of PHP memory limit calculator based on the number of url aliases and the chunk size. Using over-estimates, for the numers you provided in #2 my figures said about 170MB would be needed, so 150MB for the "real" necessary memory limit seems about right. I was thinking about adding a hook_requirements check to warn people how much memory will be needed when cron is run. Does that seem reasonable or would you have any other ideas Kevin? I'm sure I can figure out how to reduce the memory limit a little, but it's such an expensive operation to run.

dave reid’s picture

It also makes sense that the first chunk to be generated is the index file, so that won't take a lot of memory. After the first actual page has been generated, we have fetched all the alises from {url_alias} and so that's why the memory usage has jumped up substantially.

dave reid’s picture

Status: Active » Fixed

I just committed the optimial memory limit calculator and changer (if set_time_limit() works) as a part of http://drupal.org/cvs?commit=230920. The module will also add some memory limit debugging info for the xmlsitemap regeneration. I'm going to mark this as fixed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.