Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We are running reports with 60,000+ records, and have noticed that during the render process, memory usage starts to snowball noticeably around the 30,000 mark.
We're attempting to run large XML reports using Forena.
Comment | File | Size | Author |
---|---|---|---|
#19 | forena-memory.patch | 18.38 KB | metzlerd |
#7 | screenshot2.png | 169.49 KB | jamesdixon |
#4 | all_mapped_applications.tgz | 738 bytes | jamesdixon |
Comments
Comment #1
jamesdixon CreditAttribution: jamesdixon commentedComment #2
jamesdixon CreditAttribution: jamesdixon commentedI'm not convinced this is a memory leak. If we're loading 144,000 records into a DOM structure, looping through all nodes in the dom, and adding them to an output variable, the report could easily eat up 6 gigabytes or so of RAM right?
Or is there a way we can free up some of that memory as we go?
The file that was generated is 13.5 megabytes large.
What I'm concerned about is that each DomNode we're passing into FrxReport->render_section($dom_node) may not be freed by PHP, or each time we grab the nodes we want with xpath, those nodes may not be freed up by PHP.
What do you think, is it reasonable that a report with 144,000 records would use up 6 Gigabytes of RAM to render inside FrxReport->render()?
Comment #3
metzlerd CreditAttribution: metzlerd commentedI have never thought of this use case. I assume that this report is not for human consumption ( a browser cannot render html documents that big) and that you are using forena as an export tool? Are you trying to make an XML document this big? CSV? The memory footprint is a bit different depending on the document ouput format, and section_render is a potentially recursive function, so I'd need to know a bit more about structure of the report. Could you attach a .frx file?
Comment #4
jamesdixon CreditAttribution: jamesdixon commentedYou're right, we're generating CSV, so I bypass the html step, and generate the url for the csv by hand with the correct parameters.
Here is our .frx file, thanks for having a look.
I spent some time trying to free up memory on my own. I unset $child in the while loops in render_section, and unset the $nodes after use in render_section. I also tried calling gc_collect_cylcles to free up memory via garbage collection every 10,000 times render_section was run, but with no effect.
On our server, while generating the output for the report in the FrxReport.inc render function, the php memory usage grows at rate of about 25 megabytes per second.
From my research, with php 5.3 if there are circular references between objects, then php does not always free the memory up when it's done with it. I'm wondering if any of the DomNode objects or XPath objects are being created in memory inside the loops and not being freed.
My unset strategy isn't all that effective, as php 5.3 does not free up children of the unset object, so some sort of destructor routine may be needed for objects that aren't being freed up if there actually are memory issues.
Let me know if you have any ideas of where the memory leak may be coming from if it does in fact exist, I'm happy to jump in and try some things out in that render_section function.
Comment #5
metzlerd CreditAttribution: metzlerd commentedIt's a simple file, and it is being output to a table. The .csv is a two step process. The document is first rendered into html by the report, then reparsed and rendered to .csv.
You might try executing a "save as" on a link that outputs .html instead of .csv to see if the server process still runs away.
If it's XPATH issues, it is most likely the XPATHS in the token renderer that are the issue. FrxSyntaxEngine.inc does an xpath for every {} combination which is most likely the issue. As an experiment, try commenting out the xpath code in the replace method (let it just return the same value that it got) and see if the memory usage has an issue.
We could experiment with using a simple object property replacement rather than an xpath there (but only when the xpath is a simple expression) if that were the case which might bring the memory usage down. It's possilble also that the simplexmlelement xpath is creating an object which we wouldn't have control over. In that case I might have to swtch to the "DOM" method where I could reuse the same XPATH object every time. Most of Forena is designed using singleton objects to avoid this issue.
Comment #6
jamesdixon CreditAttribution: jamesdixon commentedThanks for the advice, I will try executing the "save as" method, and commenting out the xpath code afterwards to see where we are at with the memory usage.
Comment #7
jamesdixon CreditAttribution: jamesdixon commentedAfter some fighting with this, we upgraded to Forena 4.x. It is better on memory usage (we can get bigger reports out without running out of memory), but it still runs out of memory for XML reports. I've done a stack trace of our XML report generation and used a script to analyze where most of the memory usage is with a screenshot of the results attached.
It looks like SimpleXML->addChild, and htmlspecialchars are using up quite a bit of memory.
This is the technique I used to pull the memory info.
I'm not sure if it's 100% accurate as SimpleXMLElement->addChild says it's using negative memory, but it's definitely using up most of the time.
I'll scope the code out and see where we're making those calls, and why they might be creating issues. Do you have any idea of why SimpleXMLElement->addChild and htmlspecialchars might be eating up memory?
Thanks for your guidance,
James
Comment #8
jamesdixon CreditAttribution: jamesdixon commentedI think we're probably getting hung up on line 118 of FrxDrupal.inc:
When we're adding each column to the row from SQL we're running out of memory:
$row_node->addChild($key, htmlspecialchars($value));
I don't know if there's a way around this with the current code setup, as we're generating the XML in one SimpleXMLElement, and that SimpleXMLElement must be running out of space.
When the output of the XML file is only 50 megabytes large or so, I'm surprised that the SimpleXMLElement is growing to a size of around 6 Gigabytes.
Do you have any idea of why this may happen?
Comment #9
jamesdixon CreditAttribution: jamesdixon commentedI may have found the reason. Using htmlspecialchars with a SimpleXMLElement causes memory leaks. Looks to be a bug in PHP:
https://bugs.php.net/bug.php?id=39896
Comment #10
jamesdixon CreditAttribution: jamesdixon commentedThis is an old bug that should have been fixed in PHP already. We are using PHP 5.3.10. They mention a workaround is to not use foreach loops, but for loops instead.
Comment #11
jamesdixon CreditAttribution: jamesdixon commentedI'm not convinced it's that particular bug we're running into. I converted from foreach to while loops and the memory usage is still increasing at the same rate. Maybe that $row_node variable is creating some circular references that are not being released.
Comment #12
metzlerd CreditAttribution: metzlerd commentedRegarding #8. Is this where you still feel you are having he problem. #8 seems to contradict the issue summary... Is the problem in in the renderer or in the data blocks XML generation? I'm getting a bit confused as to where you think the memory usage is coming from.
Comment #13
metzlerd CreditAttribution: metzlerd commentedAlso, are you generating all the XML in a single column then? or are there large numbers of rows being queried from the database? It would also be helpful to know which database you are using as there may be some options.
Comment #14
jamesdixon CreditAttribution: jamesdixon commentedWe are just trying to get large XML reports out of Forena in general at this point. The reason for the discrepency is we switched to Forena 4.x, and in 4.x that render code is different. I'll update the issue.
With regards to #8, I think it may be the Data Block XML generation, as the memory and time spent in that area of the code according to the xdebug profiler and analysis script I ran on it are the highest: (SimpleXMLElement->addChild, and htmlspecialchars)
https://drupal.org/files/issues/screenshot2_16.png
We are generating a number of columns per row in the XML, and running the report on about 100,000 rows.
We're using MySQL with InnoDB tables. Is that you were referring to by which database we're using?
Here is the main loop of our forena XML report:
Comment #15
jamesdixon CreditAttribution: jamesdixon commentedComment #16
metzlerd CreditAttribution: metzlerd commentedI have been giving this a lot of thought and have a couple of possible approaches in mind. One would be to allow a configuration that would skip the xml rendering and build an associative array instead. Another involves more study into the memory cost of building xml and seeing if there are ways to mitigate that. There are other benefits to the first solution as the memory behavior is more straightforward. Unfortunately it will take me a bit to set up a patch for this. Ill get back to you when i have more info.
Comment #17
jamesdixon CreditAttribution: jamesdixon commentedThank you, please let me know if I can help.
Comment #19
metzlerd CreditAttribution: metzlerd commentedI have done a significant amount of work here to try and reduce memory consumption. There were a couple of issues going on. First the SImpleXML xpath implementation appears to be be a bit of the culprit. There were some cases such as foreach="*" and simple token replacements where I could abandon the xpath expressions so I did. Also there appears to be some pretty large memory consumption going on with the input filter system. At first I tried using empty filters, but then elected to bypass the filter system for these reports all together. There are two new options for doing that. At the specific report level you can specify in the report, or in the .skinfo file you can specify input_format=none and it will bypass the filter system alltogether.
These changes have been commited and pushed, so if it's easier do a git pull to test them. I've tested to 60000 rows of the node table export, and am currently working to ramp up that test to 120000 rows. Memory consumption for the 60000 rows was approximately 70M by forena. I was able to generate csv, xml and xls files with a memory setting of 256M for php. I also needed to up the maximum execution time in order to process this on my laptop, but your milage may be different in your environments.
Comment #20
metzlerd CreditAttribution: metzlerd commentedWoops the correct syntax for doing at the report level is:
or
Comment #21
jamesdixon CreditAttribution: jamesdixon commentedI tried a couple times without luck, but realized I forgot to turn the input formats off. I'm running a report now, and added the following to my skininfo file:
input_format=none
Is that all that's needed, or do I need to add it like:
FrxReport[input_format] = true
The report will take a while so I'll check it in the morning. Thanks again for the help.
Comment #22
jamesdixon CreditAttribution: jamesdixon commentedWow memory consumption has been reduced by a crazy amount by adding the option to disable input formats. With input formats turned on, our memory usage was jumping by 8 megabytes a second. Now it uses 8 megabytes every minute or two. I'll let you know if the report finishes in the morning.
I have high hopes it'll make it.
Comment #24
metzlerd CreditAttribution: metzlerd commentedI'm going to tag this as complete. Have worked with James to bring down memory comsumption. We have managed to export close to 200,000 records with reasonable memory consumption. Going to have to call that good enough for now. If there are more issues feel free to reopen.
Comment #25
jamesdixon CreditAttribution: jamesdixon commentedThis has been a major improvement in performance. Our massive 170,000 report runs in 25 minutes now. Thank you!
Comment #26
dpovshed CreditAttribution: dpovshed commentedJames & David, this is a great achievement, let me thank you on behalf of all Forena users! :)
Comment #29
freddy2677 CreditAttribution: freddy2677 commentedThere a method that generates me the report in excel or cvs and would not show first html
Comment #30
metzlerd CreditAttribution: metzlerd commented@freddy677: Not understanding your questions, but I'd love for you to open a new issue instead if there is a problem that you are struggling with solving.