We are running reports with 60,000+ records, and have noticed that during the render process, memory usage starts to snowball noticeably around the 30,000 mark.

We're attempting to run large XML reports using Forena.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jamesdixon’s picture

Version: 7.x-4.x-dev » 7.x-3.x-dev
jamesdixon’s picture

I'm not convinced this is a memory leak. If we're loading 144,000 records into a DOM structure, looping through all nodes in the dom, and adding them to an output variable, the report could easily eat up 6 gigabytes or so of RAM right?

Or is there a way we can free up some of that memory as we go?

The file that was generated is 13.5 megabytes large.

What I'm concerned about is that each DomNode we're passing into FrxReport->render_section($dom_node) may not be freed by PHP, or each time we grab the nodes we want with xpath, those nodes may not be freed up by PHP.

What do you think, is it reasonable that a report with 144,000 records would use up 6 Gigabytes of RAM to render inside FrxReport->render()?

metzlerd’s picture

I have never thought of this use case. I assume that this report is not for human consumption ( a browser cannot render html documents that big) and that you are using forena as an export tool? Are you trying to make an XML document this big? CSV? The memory footprint is a bit different depending on the document ouput format, and section_render is a potentially recursive function, so I'd need to know a bit more about structure of the report. Could you attach a .frx file?

jamesdixon’s picture

FileSize
738 bytes

You're right, we're generating CSV, so I bypass the html step, and generate the url for the csv by hand with the correct parameters.

Here is our .frx file, thanks for having a look.

I spent some time trying to free up memory on my own. I unset $child in the while loops in render_section, and unset the $nodes after use in render_section. I also tried calling gc_collect_cylcles to free up memory via garbage collection every 10,000 times render_section was run, but with no effect.

On our server, while generating the output for the report in the FrxReport.inc render function, the php memory usage grows at rate of about 25 megabytes per second.

From my research, with php 5.3 if there are circular references between objects, then php does not always free the memory up when it's done with it. I'm wondering if any of the DomNode objects or XPath objects are being created in memory inside the loops and not being freed.

My unset strategy isn't all that effective, as php 5.3 does not free up children of the unset object, so some sort of destructor routine may be needed for objects that aren't being freed up if there actually are memory issues.

Let me know if you have any ideas of where the memory leak may be coming from if it does in fact exist, I'm happy to jump in and try some things out in that render_section function.

metzlerd’s picture

It's a simple file, and it is being output to a table. The .csv is a two step process. The document is first rendered into html by the report, then reparsed and rendered to .csv.

You might try executing a "save as" on a link that outputs .html instead of .csv to see if the server process still runs away.

If it's XPATH issues, it is most likely the XPATHS in the token renderer that are the issue. FrxSyntaxEngine.inc does an xpath for every {} combination which is most likely the issue. As an experiment, try commenting out the xpath code in the replace method (let it just return the same value that it got) and see if the memory usage has an issue.

We could experiment with using a simple object property replacement rather than an xpath there (but only when the xpath is a simple expression) if that were the case which might bring the memory usage down. It's possilble also that the simplexmlelement xpath is creating an object which we wouldn't have control over. In that case I might have to swtch to the "DOM" method where I could reuse the same XPATH object every time. Most of Forena is designed using singleton objects to avoid this issue.

jamesdixon’s picture

Thanks for the advice, I will try executing the "save as" method, and commenting out the xpath code afterwards to see where we are at with the memory usage.

jamesdixon’s picture

FileSize
169.49 KB

After some fighting with this, we upgraded to Forena 4.x. It is better on memory usage (we can get bigger reports out without running out of memory), but it still runs out of memory for XML reports. I've done a stack trace of our XML report generation and used a script to analyze where most of the memory usage is with a screenshot of the results attached.

It looks like SimpleXML->addChild, and htmlspecialchars are using up quite a bit of memory.

This is the technique I used to pull the memory info.

I'm not sure if it's 100% accurate as SimpleXMLElement->addChild says it's using negative memory, but it's definitely using up most of the time.

I'll scope the code out and see where we're making those calls, and why they might be creating issues. Do you have any idea of why SimpleXMLElement->addChild and htmlspecialchars might be eating up memory?

Thanks for your guidance,

James

jamesdixon’s picture

I think we're probably getting hung up on line 118 of FrxDrupal.inc:

  /** 
   * Get data based on file data block in the repository.
   *
   * @param String $block_name
   * @param Array $parm_data
   * @param Query $subQuery
   */
  public function sqlData($sql, $options = array()) {
    $rs = $this->query($sql, $options);
    $entity_map = array();
    $select_fields = array();
    $rownum = 0;
    if (!$rs) return;

    $xml = new SimpleXMLElement('<table/>');
    if ($rs && $rs->columnCount()) foreach ($rs as $data) {
      $rownum++;
      $row_node = $xml->addChild('row');
        $row_node['num'] = $rownum;
        foreach ($data as $key => $value) {
          $row_node->addChild($key, htmlspecialchars($value));
        } 

When we're adding each column to the row from SQL we're running out of memory:

$row_node->addChild($key, htmlspecialchars($value));

I don't know if there's a way around this with the current code setup, as we're generating the XML in one SimpleXMLElement, and that SimpleXMLElement must be running out of space.

When the output of the XML file is only 50 megabytes large or so, I'm surprised that the SimpleXMLElement is growing to a size of around 6 Gigabytes.

Do you have any idea of why this may happen?

jamesdixon’s picture

I may have found the reason. Using htmlspecialchars with a SimpleXMLElement causes memory leaks. Looks to be a bug in PHP:

https://bugs.php.net/bug.php?id=39896

jamesdixon’s picture

This is an old bug that should have been fixed in PHP already. We are using PHP 5.3.10. They mention a workaround is to not use foreach loops, but for loops instead.

jamesdixon’s picture

I'm not convinced it's that particular bug we're running into. I converted from foreach to while loops and the memory usage is still increasing at the same rate. Maybe that $row_node variable is creating some circular references that are not being released.

metzlerd’s picture

Regarding #8. Is this where you still feel you are having he problem. #8 seems to contradict the issue summary... Is the problem in in the renderer or in the data blocks XML generation? I'm getting a bit confused as to where you think the memory usage is coming from.

metzlerd’s picture

Also, are you generating all the XML in a single column then? or are there large numbers of rows being queried from the database? It would also be helpful to know which database you are using as there may be some options.

jamesdixon’s picture

Issue summary: View changes

We are just trying to get large XML reports out of Forena in general at this point. The reason for the discrepency is we switched to Forena 4.x, and in 4.x that render code is different. I'll update the issue.

With regards to #8, I think it may be the Data Block XML generation, as the memory and time spent in that area of the code according to the xdebug profiler and analysis script I ran on it are the highest: (SimpleXMLElement->addChild, and htmlspecialchars)

https://drupal.org/files/issues/screenshot2_16.png

We are generating a number of columns per row in the XML, and running the report on about 100,000 rows.

We're using MySQL with InnoDB tables. Is that you were referring to by which database we're using?

Here is the main loop of our forena XML report:

     <root_tag frx:skip_root="true" id="forena-3" frx:block="dpisql/ACESInterchangeDelivery">
       <App frx:foreach="*" action="A" id="numeric">
         <BaseVehicle id="{basevehicle_vehicle_basevehicleid}"/>   <!-- {basevehicle_vehicle_yearid}; {make_basevehicle_makename}; {model_basevehicle_modelname} -->
         <BodyNumDoors id="{bodynumdoors_id}"/>  <!-- {bodynumdoors} -->
         <BodyType id="{bodytype_id}"/>   <!-- {bodytype_name} -->
         <DriveType id="{drivetype_id}"/>   <!-- {drivetype_name} -->
         <EngineBase id="{engineconfig_vehicletoengineconfig__enginebase_enginebaseid}"/>   <!-- V{engineconfig_vehicletoengineconfig__enginebase_cylinders}; {engineconfig_vehicl>
         <EngineDesignation id="{enginedesignation_id}"/>   <!-- {enginedesignation_name} -->
         <EngineVIN id="{enginevin_id}"/>   <!-- {enginevin_name} -->
         <EngineVersion id="{engineversion_id}"/>   <!-- {engineversion} -->
         <EngineMfr id="{engine_mfr_id}"/>   <!-- {engine_mfr_name} -->
         <ValvesPerEngine id="{valves_id}"/>   <!-- {valves_per_engine} -->
         <FuelDeliveryType id="{fueldeliverytype_id}"/>   <!-- {fueldeliverytype_name} -->
         <FuelDeliverySubType id="{fueldeliverysubtype_id}"/>   <!-- {fueldeliverysubtype_name} -->
         <Aspiration id="{aspiration_id}"/>   <!-- {aspiration_name} -->
         <CylinderHeadType id="{cylinderheadtype_id}"/>   <!-- {cylinderheadtype_name} -->
         <IgnitionSystemType id="{ignitionsystemtype_id}"/>   <!-- {ignitionsystemtype_name} -->
         <TransmissionMfrCode id="{transmissionmfrcode_id}"/>   <!-- {transmissionmfrcode} -->
         <TransmissionControlType id="{fitmentnotes_transmissioncontroltype_id}"/>  <!-- {fitmentnotes_transmissioncontroltype_name} -->
         <TransmissionNumSpeeds id="{transmissionnumspeeds_id}"/>   <!-- {transmissionnumspeeds} -->
         <Region id="{region_id}"/>   <!-- {region_name} -->
         <Note>{fitmentnote_field_aces_fit_value}</Note>
         <Qty>{qty_part_profile}</Qty>
         <PartType id="{part_terminology_id}"/>  <!-- {part_terminology_name} -->
         <Position id="{part_profile_position_id}"/>  <!-- {part_profile_position} -->
         <Part>{interchange_vendor_part_number}</Part>
       </App>
       <Footer>
         <RecordCount>{=count(*)}</RecordCount>
       </Footer>
     </root_tag>
jamesdixon’s picture

Version: 7.x-3.x-dev » 7.x-4.x-dev
metzlerd’s picture

I have been giving this a lot of thought and have a couple of possible approaches in mind. One would be to allow a configuration that would skip the xml rendering and build an associative array instead. Another involves more study into the memory cost of building xml and seeing if there are ways to mitigate that. There are other benefits to the first solution as the memory behavior is more straightforward. Unfortunately it will take me a bit to set up a patch for this. Ill get back to you when i have more info.

jamesdixon’s picture

Thank you, please let me know if I can help.

  • Commit c38d854 on 7.x-4.x:
    #2196009 Memory Leaks.
    
metzlerd’s picture

Status: Needs work » Needs review
FileSize
18.38 KB

I have done a significant amount of work here to try and reduce memory consumption. There were a couple of issues going on. First the SImpleXML xpath implementation appears to be be a bit of the culprit. There were some cases such as foreach="*" and simple token replacements where I could abandon the xpath expressions so I did. Also there appears to be some pretty large memory consumption going on with the input filter system. At first I tried using empty filters, but then elected to bypass the filter system for these reports all together. There are two new options for doing that. At the specific report level you can specify in the report, or in the .skinfo file you can specify input_format=none and it will bypass the filter system alltogether.

These changes have been commited and pushed, so if it's easier do a git pull to test them. I've tested to 60000 rows of the node table export, and am currently working to ramp up that test to 120000 rows. Memory consumption for the 60000 rows was approximately 70M by forena. I was able to generate csv, xml and xls files with a memory setting of 256M for php. I also needed to up the maximum execution time in order to process this on my laptop, but your milage may be different in your environments.

metzlerd’s picture

Woops the correct syntax for doing at the report level is:

<frx:options input_format="'/>

or

<frx:options input_format="none"/>
jamesdixon’s picture

I tried a couple times without luck, but realized I forgot to turn the input formats off. I'm running a report now, and added the following to my skininfo file:

input_format=none

Is that all that's needed, or do I need to add it like:

FrxReport[input_format] = true

The report will take a while so I'll check it in the morning. Thanks again for the help.

jamesdixon’s picture

Wow memory consumption has been reduced by a crazy amount by adding the option to disable input formats. With input formats turned on, our memory usage was jumping by 8 megabytes a second. Now it uses 8 megabytes every minute or two. I'll let you know if the report finishes in the morning.

I have high hopes it'll make it.

  • Commit c38d854 on 7.x-4.x, 7.x-4.x-features:
    #2196009 Memory Leaks.
    
metzlerd’s picture

Status: Needs review » Fixed

I'm going to tag this as complete. Have worked with James to bring down memory comsumption. We have managed to export close to 200,000 records with reasonable memory consumption. Going to have to call that good enough for now. If there are more issues feel free to reopen.

jamesdixon’s picture

This has been a major improvement in performance. Our massive 170,000 report runs in 25 minutes now. Thank you!

dpovshed’s picture

James & David, this is a great achievement, let me thank you on behalf of all Forena users! :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

freddy2677’s picture

There a method that generates me the report in excel or cvs and would not show first html

metzlerd’s picture

@freddy677: Not understanding your questions, but I'd love for you to open a new issue instead if there is a problem that you are struggling with solving.