The Export Module is a great addition -- thanks for all the work on it! -- and I can *almost* get it to do what I want.

When I export a view to CSV, however, I get nothing but plain text -- any hrefs, paragraph breaks, etc., get lost in the translation.

A .DOC export is better -- most formatting is preserved inside the HTML table it creates -- but hrefs are stripped out of that file too.

Is there a way to preserve all the HTML formatting that might be in an exported field?

Thanks in advance for any help of suggestions.

CommentFileSizeAuthor
#11 views_bonus_csv_strip_html.patch2.2 KBahtih
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pyc’s picture

Me I'm interested too, seems like there's no solution... :(

ckng’s picture

Status: Active » Fixed

You can overwrite the theme_views_bonus_export_csv.
Warning: most likely will break the cvs format depending on what you have in your content.

Example (append into your theme's template.php)
(without the <?php ?> tags, used for enabling syntax highlight here)

<?php
function phptemplate_views_bonus_export_csv($view, $nodes) {
  if (!user_access('export views')) {
    return;
  }
  $fields = _views_get_fields();

  // headings row
  $headings = array();
  foreach ($view->field as $field) {
    if ($fields[$field['id']]['visible'] !== false) {
      $headings[] = $field['label'] ? $field['label'] : $fields[$field['fullname']]['name'];
    }
  }
  $output .= implode(',', $headings) ."\r\n";

  // one row for each node
  foreach ($nodes as $node) {
    $values = array();
    foreach ($view->field as $field) {
      if ($fields[$field['id']]['visible'] !== false) {
        $value = $field;
        $value = views_theme_field('views_handle_field', $field['queryname'], $fields, $field, $node, $view);

        /* comment this to retain the html tags */
        $value = preg_replace('/<.*?>/', '', $value); // strip html tags

        $value = str_replace(array("\r", "\n", ','), ' ', $value); // strip line breaks and commas

        /* to retain the comma */
        // $value = str_replace(array("\r", "\n"), ' ', $value); // strip line breaks only

        $value = decode_entities($value);

        /* to use with node_import as they take in \" not "" */
        $value = str_replace('"', '\"', $value); // escape " characters

        $values[] = '"' . $value . '"';
      }
    }
    $output .= implode(',', $values) . "\r\n";
  }
  drupal_set_header('Content-Type: text/x-comma-separated-values');
  drupal_set_header('Content-Disposition: attachment; filename="view-'. $view->name .'.csv"');
  print $output;
  module_invoke_all('exit');
  exit;
}
?>

Other things you can do such as to clean up your data (extracted from php.net):

<?php
        $badchr = array(
          "\xc2", // prefix 1
          "\x80", // prefix 2
          "\x98", // single quote opening
          "\x99", // single quote closing
          "\x8c", // double quote opening
          "\x9d"  // double quote closing
        );
        $goodchr = array('', '', '\'', '\'', '"', '"');
        
        $value = str_replace($badchr, $goodchr, $value); // escape ‘,’,“,” as ',',","
?>
<?php
        $find = array( 
          '“',  // left side double smart quote
          '†',  // right side double smart quote
          '‘',  // left side single smart quote
          '’',  // right side single smart quote
          '…',  // elipsis
          '—',  // em dash
          '–'  // en dash
        );
      
        $replace = array('"', '"', "'", "'", "...", "-", "-");
        $value = str_replace($find, $replace, $value);
?>
Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

seanr’s picture

Status: Closed (fixed) » Active

This no longer matches the current code in the 5.x branch. I'm unable to find any preg_replace or similar in that function now.

jeff h’s picture

Since this is very often going to be used as the data-export half of a data migration using the node_import module, I find it odd that we can't get a "cleaner" export. Maybe I am both naive and lucky :) but doing the following worked fine for me, with lots of nodes all with embedded full HTML, line breaks etc

Comment out the line:

$values[] = '"' . str_replace('"', '""', decode_entities(strip_tags($value))) . '"';

and replace with:

$values[] = $value;

Change the print statement at the end to:

print implode($comma, $values) ."%recordseparator%";

and change the $comma definition to:

$comma = t(',%CONTENTSEPARATOR%');

Then you can tell the node_import module that your field separators are ,%CONTENTSEPARATOR% (note the leading comma; I left that in so it is still strictly speaking a CSV ie comma-sep file), and use %recordseparator% for your record separator.

It worked fine for me. Hope it helps someone.

Jeff

neclimdul’s picture

Jeff is mostly right, That is the line that you should change. It is in a theme function so I would override that as you would any-other theme function.

Just to be clear, the paired quotes is the way its documented in the CSV RFC. I'm not sure why node_import wouldn't follow this.

I'm also curious why this is an issue since node_import does its own exporting.

jeff h’s picture

I'm also curious why this is an issue since node_import does its own exporting.

Hmm, I don't believe node_import does exporting; please correct me if I'm wrong. There are several discussions in its issue queue about how to export data in the first place, and it seems to me that the only reliable batch-able way is creating a view with this module.

Jeff

neclimdul’s picture

My mistake, I seem to have been mislead by something I was told about the module. I've reopened an issue in their queue regarding this as I believe it is actually a bug in node_import.
#273939: Correct Way to Escape single and double quotes (' & ")?

bryancasler’s picture

#5 this is exactly what I wanted, but I've tried your patches on the current 5.x dev build, 5.x-1.1 build and the 5.x-1.2-alpha2 build, yet none of those work.

The HTML output in the .doc files is exactly what I want, but I need it in CSV format so that I can import it. I know it's a year+ later, but would you mind looking at your code again and see if something might be missing from it?

dgastudio’s picture

+1

ahtih’s picture

Title: Can an CSV Export Preserve Links, Formatting, Etc. in Fields? » [PATCH] Can an CSV Export Preserve Links, Formatting, Etc. in Fields?
Status: Active » Needs review
FileSize
2.2 KB

Here is a patch (against CVS HEAD) that adds option to enable/disable HTML stripping. The default is "enable", i.e. the old functionality.

Pomliane’s picture

Status: Needs review » Closed (won't fix)

This version of Views Bonus Pack is not supported anymore. The issue is closed for this reason.
Please upgrade to a supported version and feel free to reopen the issue on the new version if applicable.

This issue has been automagically closed by a script.