Project:Drupal.org Project applications
Component:module
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work

Issue Summary

PURPOSE
-------

The purpose of this module is to provide a way to move data between Drupal websites.

If the module is installed it is then very easy to export nodes, taxonomy terms or users to data files. These files can then be imported into a Drupal site which has the module installed.

This makes it easy to move what we would normally think of as data (nodes, taxonomy terms, users) between Drupal sites.

Key points:

* When nodes/terms/users etc are imported they are assigned the same ID number as they had when they were exported.
* Files attached to nodes via attachements or CCK fields are exported to the data file and imported to the receiving site.
* Checks are carried out to ensure vocabularies exist and that content types have matching fields etc.
* Batch API is used on node export/import to deal with large datasets.
* After an import the data in the receiving site is an exact match to the data which was exported to the file. In principle this works like rsync with the --delete option.
* Easy to test. Install on test Drupal site, export nodes/terms/users to files, delete some data, import the data file, admire the nodes etc which re-appear.
* A drush interface has been added so this module can be used from the command line.

Images which show part of the admin interface are available here.

OVERVIEW
--------

Data export and import is not easy to achieve as Drupal stores configuration and data in the same database. Also, Drupal can be extended and customised which makes creating a standard data export/import module effectively impossible.

This module provides profiles for exporting/importing the main data items; nodes, taxonomy terms and users. These profiles make it possible to export these items into dataset files and then import these files into a different Drupal site.

This module is customisable and extendable by the addition of new profiles to take into account different Drupal builds. This means that if a Drupal instance has a very specialised set-up it will be possible to create a new dataset profile to export/import data.

As standard the following data items can be exported and imported.

* Taxonomy terms.
* Nodes, selectable by content type.
* Users.

When exporting the data is saved in files which are placed into the current files/ directory. When importing the data is used in a way which makes sure that the receiving Drupal instance ends up with matching data. Careful checks are used to ensure ID's match and that the data remains consistent.

Effectively this module is similar to the backup_migrate module except that the backup_migrate profiles are extended. Currently the backup_migrate profiles only allow for the choice of which tables will be exported/imported and a few other settings. This module uses the Druapl API to read data and produce rich data files which contain all data needed to reproduce a particular dataset. It also uses the Drupal API to recreate the data.

The primary usage for this module is envisioned to be exporting datasets from a live Drupal site and importing this live data into a new updated version of the site which developers have been working on - i.e. a beta version. Once the import has completed then the newly developed site can become the new live site. This has the advantage of being able to fall back to the original live site if there is an unforseen problem between the new version of the site and the live data.

Very importantly, this module is also extensible with additional profiles enabling the export/import of data of any format. Because Drupal can be extended and customised in any way it may be that certain sites contain exotic content types or even very specific custom data. Users will be able to create profiles which will be able to export/import for their specific site in conjunction with this module.

It is hoped that it will be possible to set up a way for module users to be able to share profiles so they can be re-used or extended as required.

WHY THIS IS DIFFERENT FROM EXISTING IMPORT/EXPORT MODULES
---------------------------------------------------------

Used as part of a new type of deployment plan
---------------------------------------------

The focus of this module is to provide a tool to help with the way Drupal sites are developed, tested and deployed.

Most deployment plans push updates from dev to testing to live. This means that it is not possible to safely carry out large changes. If a site has been extensively redeveloped then there is a risk that when the changes are pushed on to live the site will break. Of course, it's possible to test on a copy of the live site as much as possible and it is possible to have a secondary copy of the live site to fall back on if the new live site breaks.

A better approach is to separately develop the new copy of the site as a beta copy - changes made by developers can be pushed up to a central beta copy of the site. This site can then be extensively developed and can be completely different from the existing live site. Using the Data Export Import module it will then be possible to export live data (users, nodes, taxonomy terms etc) from the existing live site and import that data into the beta site.

When the beta is ready to be deployed a final export of the live site data can then be imported to the beta site - and the beta site set to be the new live site. The advantages are:

* The beta site can be extensively tested with a full copy of the live data before going live.
* The existing live site is untouched and can be re-enabled if the beta site has any serious issues.

Designed to be easily extensible
--------------------------------

Because Drupal can be extended and customised in any way then this module needs to be extensible. Existing delpoyment modules depend on module developers to extend their module to fit in with the deployment module.

The Data Export Import module is designed to be easily extensible.

For example, it may be the case that a Drupal site has been heavily customised and modified and contains very specific types of data. This could be exotic content types of it could mean data which is specific to a custom module.

Using Data Export Import it will be possible for the site developers to add on new profiles which are specifically designed to export and import that site's data. For exmaple, it will be possible to add on a profile called 'Buggins Widgets delivery schedules profile' which will export/import data specific for that complany's website.

Obviously, certain profiles are provided as standard; nodes, taxonomy terms and users.

It is planned that there will be a system of sharing profiles which may be of use to others - either used directly or to be built on and extended.

No existing module is suitable
------------------------------

After very extensive research into the existing import/export modules it was found that none of them were able to export and import data in a suitable way.

* Deploy module.

The big underlying assumption with the deploy module is:

'Deploy 6.x-1.x assumes that you have two exact clones of you code and configuration across your staging site and your production site (expect the Deploy and Services specific stuff). You can't have anything that differs the sites from each other. Nothing.'

The Data Export Import module is designed to allow for data to be exported to dataset files which can then be imported into an different (e.g. beta) version of a site.

Also, Deploy transmits data directly via the services module. The Data Export Import module exports to dataset files which can be transferred in different ways - and data can be re-imported back in to the exporting instance which is useful for testing purposes. I.e. Developers can work on a site and make drastic changes to taxonomy terms and nodes etc and when testing is finished the original data can be re-imported from a dataset file.

* Features module

The Features module is focussed on exporting/importing configuration settings. The Data Export Import module is focussed on exporting and importing content type of data such as nodes, taxonomy terms, users etc.

* Stager

Focussed on pushing from staging to live. Also it is Drupal 7.x only.

* Backup and Migrate

The Backup and Migrate module works at the database table level. Since there are so many interdependencies between tables it is not possible to cleanly extract datasets in a way which means they can be re-imported. The Data Export Import module works at the API level to recreate datasets exactly.

* Staging

From the staging project page:

'This module do not work very well on sites that contains user generated content, on live site. So if you need this modules's name or url or what ever, let me know.'

* Migrate

The primary focus of the Migrate module is the importing of data from non-Drupal sources. Also, it depends on several modules including, Dbtng, elements, autoload etc. It also relies on Drush and although Drush is an excellent tool it is not used by all developers.

The Data Export Import module is designed to export/import Drupal data and is designed to be a standalone module which just uses the Drupal API to carry out it's functions and should not need to rely on other modules. Also, the user interface will be set up as cleanly as possible to enable relatively inexperienced Drupal developers to export and import datasets.

The Migrate module also requires that module developers add extensions to their modules. The Data Export Import module does not need any other extensions to any other modules. Indeed, the Data Export Import module is designed to be extensible itself - so developers can add custom profiles to match their exact site requirements.

PROJECT PAGE
------------

http://drupal.org/sandbox/bailey86/1278830

GIT REPOSITORY
--------------

http://drupal.org/node/1278830/commits

DRUPAL VERSION
--------------

Currently built for Drupal 6.

Comments

#1

Status:needs review» needs work

It appears you are working in the "master" branch in git. You should really be working in a version specific branch. The most direct documentation on this is Moving from a master branch to a version branch. For additional resources please see the documentation about release naming conventions and creating a branch in git.
Review of the master branch:

  • Run coder to check your style, some issues were found (please check the Drupal coding standards). See attachment.
  • Comments: there should be a space after "//", see http://drupal.org/node/1354#inline
    test.php:17://echo "dataset file: ".$dataset_file."\n";
  • ./includes/taxonomy_terms.inc: comment lines should break at 80 characters, see http://drupal.org/node/1354#general
                                       // 'depth' => $file_content_as_array['vocabulary_terms'][$existing_vocabulary_id][$existing_term->tid]->depth,
                                     // 'depth' => $file_content_as_array['vocabulary_terms'][$existing_vocabulary_id][$existing_term->tid]->depth,
  • ./data_export_import.module: comment lines should break at 80 characters, see http://drupal.org/node/1354#general
                                                                //'description' => 'Export a dataset.',
                                                                //'page callback' => 'data_export_import_callback',
                                                                // 'page arguments' => array('', 'backup_migrate_ui_manual_backup_quick', TRUE),
                                                                //'access arguments' => array('access content'),
                                                                //'description' => 'Import a dataset',
                                                                // 'page arguments' => array('', 'backup_migrate_ui_manual_restore', TRUE),
                                                                     //'description' => 'Set where datasets are stored',
                                                                     // 'page arguments' => array('', 'backup_migrate_ui_manual_restore', TRUE),
                                                                 //'description' => 'Dataset profiles',
                                                                 // 'page arguments' => array('', 'backup_migrate_ui_manual_restore', TRUE),
                                                                  //'description' => 'Scheduled exports and imports',
                                                                  // 'page arguments' => array('', 'backup_migrate_ui_manual_restore', TRUE),
      //$form['file_upload'] = array('#title' => t('Upload file'), '#type'  => 'file');
      //$form['submit_upload'] = array('#type'  =>  'submit', '#value'  =>  'Submit');
  • Comments should be on a separate line before the code line, see http://drupal.org/node/1354#inline
    ./includes/taxonomy_terms.inc:264:  } // foreach vocabulary - this is the test loop to check that the
    ./includes/taxonomy_terms.inc:471:      } // Found a term in the dataset which matches on term id and
    ./includes/taxonomy_terms.inc:474:    } // Looping through existing terms.
    ./includes/taxonomy_terms.inc:480:  } // Looping through the existing vocabularies
    ./includes/taxonomy_terms.inc:523:                                 'vid' => $term_from_dataset->vid, // Voacabulary ID
    ./includes/taxonomy_terms.inc:549:      } // Block to deal with term needing to be created.
    ./includes/taxonomy_terms.inc:551:    } // Looping through the dataset terms.
    ./includes/taxonomy_terms.inc:557:  } // Looping through the dataset vocabularies.
    ./includes/taxonomy_terms.inc:570:} // import_vocabularies function.
    ./data_export_import.module:28:  $output = '';  //declare your output variable
    ./data_export_import.module:36:  } // function data_export_import_help
  • ./includes/taxonomy_terms.inc: all functions should be prefixed with your module/theme name to avoid name clashes. See http://drupal.org/node/318#naming
    function taxonomy_save_term_with_defined_tid(&$form_values) {
  • ./includes/taxonomy_terms.inc: The description on the line after the @param/@return documentation is either missing or not formatted correctly. See http://drupal.org/node/1354#functions
    652- *  Internal use only.
  • ./data_export_import.module: The description on the line after the @param/@return documentation is either missing or not formatted correctly. See http://drupal.org/node/1354#functions
    25-   */
  • There should be no space after the opening "(" of an array, see http://drupal.org/node/318#array
    includes/taxonomy_terms.inc:67:      if (in_array( $term->tid, $list_of_terms)) {
  • Remove all old CVS $Id tags, they are not needed anymore.
    data_export_import.info:1:; $Id$

This automated report was generated with PAReview.sh, your friendly project application review script. Please report any bugs to klausi.

AttachmentSize
coder-result.txt 5.55 KB

#2

Thanks for the review.

I've created a new branch called 6.x-1.x - is this now OK? I haven't deleted the master branch yet.

Is deleting the master branch crucial? I see the code to do this is the following - but thought I'd check RE the best way to follow the standard etc.

git checkout master
rm -R ./*
echo See major version branches.> README.txt
git add -A
git commit -am "Removed files from deprecated master branch."

I've found the switch to make the coder module stricter - I've run it and tidied up the code until no errors show up. However, it may be that your version of coder is better - mine does not pick up the errors RE t().

Will be grateful to have any further feedback.

Thanks.

#3

Status:needs work» needs review

I've set this to 'needs review'. Can I take it that this is what I'm supposed to do after fixing up the points raised by the reviewer.

#4

This has now been run through coder_tough_love and no errors show.

Extra stubs have been put in place for the next dataset profiles.

Currently it is ready to test:
* Install module.
* Export taxonomy terms.
* Change something with some taxonomy terms. NB - Use a test vocabulary or change terms which are not crucial.
* Go to 'Import' tab and import the dataset file which was just exported and which should be listed there.
* Go back to the taxonomy terms and see them magically restored to as they were when exported.

#5

Hi,

I've set this to 'needs review'. Can I take it that this is what I'm supposed to do after fixing up the points raised by the reviewer.

Yes, this is exactly it. ;)

Automated review (Please keep in mind that this is primarily a high level check that does not replace but, after all, eases the review process. There is no guarantee that no other issues could show up in a more in-depth manual follow-up review.)

Review of the 6.x-1.x branch:

  • Run coder to check your style, some issues were found (please check the Drupal coding standards):
    Severity minor, Drupal Commenting Standards, Internationalization, Drupal Security Checks, Drupal SQL Standards, Drupal Coding Standards

    sites/all/modules/pareview_temp/test_candidate/data_export_import.module:
    +35: [normal] String concatenation should be formatted with a space separating the operators (dot .) and the surrounding terms
    +42: [minor] Comment should be read "Implements hook_foo()."

    Status Messages:
    Coder found 1 projects, 1 files, 1 normal warnings, 1 minor warnings, 0 warnings were flagged to be ignored
  • ./includes/profiles/taxonomy_terms.inc: comment lines should break at 80 characters, see http://drupal.org/node/1354#general
            // Currently changed from chop() to rtrim() as recommended by coder_tough_love.
  • Comments should be on a separate line before the code line, see http://drupal.org/node/1354#inline
    ./includes/profiles/taxonomy_terms.inc:517:                                 'vid' => $term_from_dataset->vid, // Voacabulary ID
    ./includes/profiles/pages.inc:40:  $orig = node_load($node_id); // load original
    ./includes/profiles/pages.inc:47:  // $node = $orig;       // copy for mods
    ./includes/profiles/pages.inc:49:  // //  $orig->body = 'xxxxqwertyasdasdasd12345678'; // Do some changes
    ./includes/profiles/articles.inc:40:  $orig = node_load($node_id); // load original
    ./includes/profiles/articles.inc:47:  // $node = $orig;       // copy for mods
    ./includes/profiles/articles.inc:49:  // //  $orig->body = 'xxxxqwertyasdasdasd12345678'; // Do some changes
  • All text files should end in a single newline (\n). See http://drupal.org/node/318#indenting
    ./test.php ./data_export_import.info ./README.txt ./includes/profiles/taxonomy_terms.inc ./includes/profiles/pages.inc ./includes/profiles/articles.inc

This automated report was generated with PAReview.sh, your friendly project application review script. Go and review some other project applications, so we can get back to yours sooner.

Manual review

#6

Status:needs review» needs work

#7

Status:needs work» needs review

Code has been tidied and run past pareview_sh and coder (with coder_tough_love).

Master branch code has been pruned to a single README.txt file as advised.

#8

Status:needs review» needs work

Hi,

as I just extended my review environment with the drupalcs script, here's what comes out now:

Review of the 6.x-1.x branch:

  • Drupal Code Sniffer has found some code style issues (please check the Drupal coding standards):
    FILE: ...ites/all/modules/pareview_temp/test_candidate/data_export_import.module
    --------------------------------------------------------------------------------
    FOUND 9 ERROR(S) AND 16 WARNING(S) AFFECTING 25 LINE(S)
    --------------------------------------------------------------------------------
       4 | ERROR   | The second line in the file doc comment must be " * @file"
      25 | WARNING | Line exceeds 80 characters; contains 86 characters
      34 | ERROR   | Line indented incorrectly; expected 4 spaces, found 2
      35 | ERROR   | Line indented incorrectly; expected at least 6 spaces, found 4
      39 | ERROR   | Closing brace indented incorrectly; expected 0 spaces, found 2
      61 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_NORMAL_ITEM
      67 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_DEFAULT_LOCAL_TASK
      74 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_LOCAL_TASK
      81 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_LOCAL_TASK
      88 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_LOCAL_TASK
      95 | WARNING | A comma should follow the last multiline array item. Found:
         |         | MENU_LOCAL_TASK
    195 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    205 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    215 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    260 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    287 | WARNING | A comma should follow the last multiline array item. Found:
         |         | $options
    297 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    324 | WARNING | A comma should follow the last multiline array item. Found:
         |         | $options
    336 | WARNING | A comma should follow the last multiline array item. Found:
         |         | FALSE
    362 | WARNING | A comma should follow the last multiline array item. Found:
         |         | $options
    393 | ERROR   | File is being conditionally included; use "include_once"
         |         | instead
    394 | ERROR   | File is being conditionally included; use "include_once"
         |         | instead
    461 | ERROR   | File is being conditionally included; use "include_once"
         |         | instead
    462 | ERROR   | File is being conditionally included; use "include_once"
         |         | instead
    463 | ERROR   | File is being conditionally included; use "include_once"
         |         | instead
    --------------------------------------------------------------------------------


    FILE: ...all/modules/pareview_temp/test_candidate/includes/profiles/articles.inc
    --------------------------------------------------------------------------------
    FOUND 2 ERROR(S) AFFECTING 2 LINE(S)
    --------------------------------------------------------------------------------
      4 | ERROR | The second line in the file doc comment must be " * @file"
    43 | ERROR | String concat is not required here; use a single string instead
    --------------------------------------------------------------------------------


    FILE: ...es/all/modules/pareview_temp/test_candidate/includes/profiles/pages.inc
    --------------------------------------------------------------------------------
    FOUND 1 ERROR(S) AFFECTING 1 LINE(S)
    --------------------------------------------------------------------------------
    4 | ERROR | The second line in the file doc comment must be " * @file"
    --------------------------------------------------------------------------------


    FILE: ...dules/pareview_temp/test_candidate/includes/profiles/taxonomy_terms.inc
    --------------------------------------------------------------------------------
    FOUND 18 ERROR(S) AND 2 WARNING(S) AFFECTING 20 LINE(S)
    --------------------------------------------------------------------------------
       4 | ERROR   | The second line in the file doc comment must be " * @file"
    134 | ERROR   | Closing brace indented incorrectly; expected 0 spaces, found 2
    208 | ERROR   | Missing space after the //
    210 | ERROR   | Missing space after the //
    213 | ERROR   | Missing space after the //
    228 | ERROR   | Space found before comma in function call
    237 | ERROR   | Missing space after the //
    243 | ERROR   | Missing space after the //
    273 | ERROR   | Missing space after the //
    279 | ERROR   | Missing space after the //
    282 | ERROR   | Missing space after the //
    287 | ERROR   | Missing space after the //
    341 | ERROR   | Missing space after the //
    344 | ERROR   | Missing space after the //
    348 | ERROR   | Missing space after the //
    369 | ERROR   | Missing space after the //
    373 | ERROR   | Missing space after the //
    445 | WARNING | A comma should follow the last multiline array item. Found:
         |         | relations
    523 | WARNING | A comma should follow the last multiline array item. Found:
         |         | relations
    621 | ERROR   | A cast statement must be followed by a single space
    --------------------------------------------------------------------------------

This automated report was generated with PAReview.sh, your friendly project application review script. Go and review some other project applications, so we can get back to yours sooner.

hth,
dave

#9

Status:needs work» needs review

Thanks for the previous review.

Code now passes code, pareview_sh and code sniffer!

Export and import of taxonomy terms and pages is now complete.

Extra blank pages have been commented out to keep the interface cleaner.

Docs have been added to to explain how to download and upload the dataset files.

Some stubs of code have been left in to start the export/import of another content type - these have been commented out in a compliant way. Next I will look to move the development of new content types to a development branch.

#10

Code has been updated, cleaned up, run through all coding standards tests.

Obscure bug related to updating pages has been fixed.

#11

Priority:normal» major

Hi,

This was set to 'needs review' over two weeks ago so I've taken the liberty to raise the priority level as per the guidelines.

Currently the code is polished and users can export then import taxonomy terms and pages. Should all be working fine and the interface is clean and easy to use.

I'm currently working on another branch to enable the export and import of all content types - but I'd like to get this project reviewed/tested/approved etc before merging in this new work as a development branch.

Thanks in advance for any reviews carried out.

#12

Priority:major» minor

Just dropped the priority back down - the pareview_sh module is reporting functions as needing to have the module name at the beginning of all functions. Unfortunaltely, this looks like it is also flagging up internal functions (those which start with an underscore).

#13

Priority:minor» major

All was fine - the code review tools were picking up things from the new branch which is not in this release. Will repost the reason I raised the priority to major.

This was set to 'needs review' over two weeks ago so I've taken the liberty to raise the priority level as per the guidelines.

Currently the code is polished and users can export then import taxonomy terms and pages. Should all be working fine and the interface is clean and easy to use.

I'm currently working on another branch to enable the export and import of all content types - but I'd like to get this project reviewed/tested/approved etc before merging in this new work as a development branch.

Thanks in advance for any reviews carried out.

#14

Priority:major» minor

The online service at:

http://ventral.org/pareview

has found some errors - will be working through them now. Will move the priority back up to major after the code has been tidied.

Hopefully this online service is a one-stop code review process now and will make the process of testing much simpler.

#15

Priority:minor» major

The review at:

http://ventral.org/pareview

is now being passed with now errors. This online test includes pareview and code sniffer - and I think pareview runs coder as well.

#16

Status:needs review» needs work

I have reviewed this module manualy.
But perhaps should some one other review the taxonomy part againe, i was gone tired will review. Perhaps not found all.
Nearly all i found is fixed within the attached patch.

I found some coding style issues.

1.)
http://drupal.org/coding-standards#array
place closing brackets at new line
for me its easy da read and avoid problems will add new array element

2.)
array elements shoulb be idneted with 2 spaces

3.)
i dont like debug stuff in commited code, but this is my personal opinion
if ($debug) {

Some errors:

1.)
function data_export_import_import_form()
file: data_export_import.module
line: 197
$directory = "sites/default/files/data_export_import/pages";
remove this line, ovoid hard coded pathes
not every one have this module in sites/default

2.)
files missing
profiles/pages.inc
profiles/taxonomy_terms.inc

3.)
avoid include_once use module_load_include() insted

4.)
function data_export_import_export_form_submit()
file: data_export_import.module
line: 320
$dataset_file = data_export_import_export_articles()
call a not declared function

5.)
include filename in t() in other languages perhaps it not have to be on the end af string
check_plain(t("The Articles dataset was output to the file:") . $dataset_file)

t('The Articles dataset was output to the file: @dataset_file', array('@dataset_file', $dataset_file))

6.)
function data_export_import_import_form_submit()
file: data_export_import.module
line: 403
trigger buttons with not exist make no sens
if ($form_state['clicked_button']['#post']['taxonomy_terms'] != 'none') {

how to trigger button clicks within drupal:
http://api.drupal.org/api/drupal/includes--form.inc/function/_form_butto...

7.)
function data_export_import_export_pages()
file: page.inc
line: 26
if it is an object, use it as object

8.)
function data_export_import_export_pages()
file: page.inc
line: 44
if you will do this with an content type, with more as 100 nodes, you will run out of php max execution time!
it is realy not usable to save all nodes to same file
http://api.drupal.org/api/drupal/includes--form.inc/group/batch/7

9.)
function data_export_import_export_pages()
file: page.inc
line: 60
avoid serializie, use var_export or json_encode insted
why use serializie when there is no need to recreate classes, it is overkill
json_encode() or var_export() ar mutch faster. The featuresm odule an some other prever var_export()
if you are afraid of securety issues or white page dies, use json_encode()

10.)
function data_export_import_export_pages()
file: page.inc
line: 156
dont compare arryas to each other.
better compare json or serialized strings. Bett hte md5`s of them

11.)
function data_export_import_export_pages()
file: page.inc
line: 171
t() dont need ony check_plain() bevor
http://api.drupal.org/api/drupal/includes--common.inc/function/t/6

12.)
function data_export_import_export_pages()
file: page.inc
line: 171
if a drupal_set_message() is an, error, dont prefix it with "ERROR -", set second parameter "error"

13.)
function data_export_import_export_pages()
file: page.inc
line: 205
count($file_content_as_array['nodes'])==0
better use if(empty($file_content_as_array['nodes']))
for better performance and error handling

14.)
function data_export_import_export_taxonomy_terms()
file: taxonmy_terms.inc
line: 35
directly use vid is dangerous, cause perhaps them are not equal on diferent machines
better use vocabular machine name for identify
specizialy in the case when you have several developers for one project

/**
* Get a taxonomy object by its name
* @param string $name The name of the taxonomy
* @return stdClass The vocabulary object
*/
function get_taxonomy_by_name($name) {

  $vocabularies =  taxonomy_get_vocabularies();
  foreach($vocabularies as $vid => $vocabulary) {
    if ($vocabulary->name == $name) {
      return $vocabulary;
    }
  }
  return null;
}

15.)
file: data_export_import.info
the dependencies are missing

15.)
$synonym_list .= $synonym . "\n";
is faster to read than
$synonym_list = $synonym_list . $synonym . "\n";

questions

1.)
Overrides the standard drupal_write_record() function.
function data_export_import_drupal_write_record_via_insert_with_defined_id()
why you have had to do that?

2.)
Why is data_export_import_taxonomy_get_tree_with_reset() better than
http://api.drupal.org/api/drupal/modules--taxonomy--taxonomy.module/func...

3.)
did you have tested the taxonomy import / export with transalated terms?

AttachmentSize
0001-changes-from-code-review.patch 31.83 KB

#17

Status:needs work» needs review

Thanks for the review.

All points have been worked through - I've attached a document which outlines all the details of the work carried out and responses to the points raised.

Module fully passes the coding standards review at: http://ventral.org/pareview/httpgitdrupalorgsandboxbailey861278830git

Module is fully functional - users can export and import all taxonomy terms - and can export/import all standard page nodes.

Once the module is accepted I will be able to work on the next phase which is to export/import all content types and users.

AttachmentSize
response_to_review_20111214_2.txt 9.06 KB

#18

Priority:major» normal
Status:needs review» needs work

Review of the 6.x-1.x branch

  • data_export_import_help(): This is a hook implementation, so the first line in the comment must be "Implements hook_help().". Please also check your other hooks. See http://drupal.org/node/1354#hookimpl
  • data_export_import_callback_export(): if you just want to return a form on your page callback you can use "drupal_get_form" directly in hook_menu(). Same for data_export_import_callback_import().
  • "'#value' => 'Create dataset files',": Make sure that all user facing text runs through t() for translation.
  • "$directory = "sites/default/files/data_export_import/taxonomy_terms";": still a hard-coded path.
  • "drupal_set_message(t("The data export functionality is not currently compatible with localization."), 'error');": that message is not correct, it should be "internationalization (i18n)"
  • "$form_state['clicked_button']['#post']['pages']": do not access raw POST data, use $form_state['clicked_button']['values'] instead.
  • test.php: I don't think this file has any value in the repository. Also you use drush integration if you want to do something from the command line with Drupal.
  • data_export_import_export_pages(): This will only work for a rather small amount of nodes, for larger sites you will run into a timeout or a memory issue. Batch API helps with that http://api.drupal.org/api/drupal/includes--form.inc/group/batch/6
  • I think node_import is very similar to your approach, why does it not work for you? http://drupal.org/project/node_import
  • Re-implementing node_save() and drupal_write_record() really scares me. The migrate module for example just sets the "is_new" property on a new node with an id that should be preserved, but I think that only works in Drupal 7. http://drupal.org/node/1382932#comment-5400362
  • "echo "Term not saved correctly. Term id:" . $existing_term->tid . "\n";": Remove all echo statements from your code and replace them with proper messages.

#19

Status:needs work» needs review

Thanks for the review. All points have been fixed or addressed so the module is ready for review again.

  • data_export_import_help(): This is a hook implementation, so the first line in the comment must be "Implements hook_help().". Please also check your other hooks. See http://drupal.org/node/1354#hookimpl
  • >>> REPLY
    Comments have been changed.

  • data_export_import_callback_export(): if you just want to return a form on your page callback you can use "drupal_get_form" directly in hook_menu(). Same for data_export_import_callback_import().
  • >>> REPLY
    I'd prefer to leave them in separate functions as later I may add some checks which would be carried out before the current forms are loaded.

  • "'#value' => 'Create dataset files',": Make sure that all user facing text runs through t() for translation.
  • >>> REPLY
    The t() function has been added to 'Create dataset files' and elsewhere where needed. Menu items have been left as they are as pareview says 'Menu item titles and descriptions should NOT be enclosed within t()'.

  • "$directory = "sites/default/files/data_export_import/taxonomy_terms";": still a hard-coded path.
  • >>> REPLY
    Fixed.

  • "drupal_set_message(t("The data export functionality is not currently compatible with localization."), 'error');": that message is not correct, it should be "internationalization (i18n)"
  • >>> REPLY
    Fixed.

  • "$form_state['clicked_button']['#post']['pages']": do not access raw POST data, use $form_state['clicked_button']['values'] instead.
  • >>> REPLY
    Fixed.

  • test.php: I don't think this file has any value in the repository. Also you use drush integration if you want to do something from the command line with Drupal.
  • >>> REPLY
    Fixed - the file has been removed.

  • data_export_import_export_pages(): This will only work for a rather small amount of nodes, for larger sites you will run into a timeout or a memory issue. Batch API helps with that http://api.drupal.org/api/drupal/includes--form.inc/group/batch/6
  • >>> REPLY
    I've currently tested this module with 5,000 nodes and 5,000 taxnomy terms and the export and import ran fine. I would like to change the code to run functions under Batch API and I've looked closely at Batch API to see how it can be used.

    I'm currently being sponsored to create this module for a company I am working for. If the module is accepted as a full Drupal module then they may be more likely to sponsor it further and myself or one of their full-time coders can add in the Batch API functionality.

  • I think node_import is very similar to your approach, why does it not work for you? http://drupal.org/project/node_import
  • >>> REPLY
    I looked very closely at node_import.

    The key/main/important difference is that this module will import the nodes and assign them the same ID number as they had when they were exported.

    Also, it will only update or create the nodes (or taxonomy terms) necessary to make them match the nodes/terms which were exported. So if you have a dataset of 1,200 nodes - and the receiving instance already holds the first 1,000 nodes - then only the new 200 nodes will be imported. Node Import will re-import all the nodes and will create them with new ID numbers. So before any import it would be necessary to delete all the existing nodes.

    Think of this module is terms of 'rsync --delete' for terms and nodes.

    If the terms and nodes have the same ID number as the original data then any cross references and hard-coded links will still work OK. The sites I'm working on are huge multimedia sites with dozens of custom modules. And dozens of editors/reporters etc - and many of the content types (approx 17) use the node ID in their path.

    This module is to be used as part of a whole dev/staging/live deployment process. I've written about the process in the description but basically the dev's all push site functionality/design changes up to a common beta site - and using this module live data can be exported from the current live site and imported into the beta site. The beta site becomes the new live site when all has been tested.

    I.e. - using this module you pull data down from a live site to a beta rather than pushing code/module changes from a beta site to a live site.

  • Re-implementing node_save() and drupal_write_record() really scares me. The migrate module for example just sets the "is_new" property on a new node with an id that should be preserved, but I think that only works in Drupal 7. http://drupal.org/node/1382932#comment-5400362
  • >>> REPLY
    Agreed - as you say that is D7 only - so I've had to override the current function to get the functionality needed in D6.

  • "echo "Term not saved correctly. Term id:" . $existing_term->tid . "\n";": Remove all echo statements from your code and replace them with proper messages.
  • >>> REPLY
    Fixed.

    FINAL NOTES.

    If you check out the all_content_types branch you'll see that I now have the module able to cleanly export/import any of the content types on a Drupal instance - they get listed automatically and the dataset files get named automatically. Standard CCK fields work fine. I am currently working on exporting then re-importing any files which have been attached to a node or are from a CCK field (FileField or ImageField).

    If this module is shown to be accepted as part of the Drupal module list my current employer may be happy to sponsor further development either by myself or by one of their own full-time coders. Currently the 6.x-1.x branch export/imports taxonomy terms and pages - I feel that the terms import/export on it's own would be very useful to many. Further down the line we're looking to add a profile to export/import users as well.

    Please note - the way the module is designed on the new all_content_types branch allows for site devs to create their own completely custom profiles by adding a single file containing the relevant functions. This will allow the export/import of ANY data for any site no matter how specialised any added modules are.

    FURTHER WORK

    As this module addresses a big issue for Drupal - the data content migration issue - I'm hoping it will be of use to the wider Drupal community who may wish to work with it and enhance it further.

    CODE STANDARDS

    I've run the code through the automated tests at http://ventral.org/pareview and it's looking fine. I'm sure you remember from the discussion I generated at http://groups.drupal.org/node/195303 about the coding standards tools!

    Thanks for your review - I await any further feedback if you find the time.

    And thanks for the heroic effort on the modules queue - it was getting quite large and seemed to be becoming a log jam.

#20

The branch to export and import all content types has now been merged in.

This module will now detect all the site's content types and offer to export/import them. Files attached via the Upload module or by FileField or ImageField CCK fields will be output to the dataset file and can then be re-imported. There may be issues if the files are larger than 1MB - but I'm currently working to deal with larger files. If needed I may use batch API - but currently this module has been tested with 5,000 nodes in a dataset and all is well.

Using Batch API is obviously the way forward but may not be needed by my current employer.

#21

There is now a branch which exports/imports using the batch API. The branch is called handle_large_files.

I will now clean up the code and make the batch API messages nicer. Once the code is compliant I will merge in the handle_large_files branch to the 6.x-1.x branch.

#22

Priority:normal» major

The main 6.x-1.x branch is now using the Batch API to deal with large datasets.

#23

This module now exports and imports users.

#24

This issue description has been updated to reflect that fact that the module is ready to be used out of the box for Nodes, Taxonomy Terms and Users.

#25

A Drush interface has been added to enable the module to be used from the command line.

#26

I am interested in how this is progressing.

You say this module preserves nids and uids (and therefore presumably noderefs and userrefs). What happens during the sync process if the node id or user id already exists on the destination db?

#27

Think in terms of the module acting like rsync with the --delete option.

If the node id or user id (or term id) already exists in the destination DB then those nodes/users are updated with the information in the incoming data file. The plan is that the nodes and users are first exported to a data file - then this data file is imported into a destination DB - and after the importation the users/nodes should be an *exact* match of the data which was exported from the first Drupal site.

This is useful for the following reason.

Say you have a live site which you export from - and it has a new node with id of 123.

You also have a beta copy of the site which is being tested - and a developer creates a test node with dummy data which has an id of 123.

If you import from the data file exported from the live site the data for node 123 will overwrite the data in the beta site. I.e. it will replace test/dummy data with data from the live site.

Similarly, if on the beta site a developer creates a test node of id 123456 which does not exist on the live site (which exported the data) then this node will be deleted from the destination DB.

Preserving the node ID's of everything - terms/users/nodes - will (as you say) keep noderefs and userrefs working.

As mentioned - the idea is that terms/users/nodes will be exactly replicated between Drupal sites.

This is all part of a larger development/deployment plan which has been documented in the module. I'm currently working on enhancing the plan and should be posting a white paper when it is ready.

#28

Priority:major» normal
Status:needs review» needs work

Sorry for the delay, but you have not listed any reviews of other project applications in your issue summary as strongly recommended here: http://drupal.org/node/1011698

manual review:

  1. data_export_import.module: don't execute PHP code gloablly here, as this will be called on every single page request. Include your files only in the appropriate hooks/functions.
  2. data_export_import_callback_overview(): all user facing text must run through t() for translation.
  3. data_export_import_menu(): why do you include files in hook_menu()? You don't need them there?
  4. dei.drush.inc: make sure to use your full module name as function prefix to avoid name collisions with other modules.

#29

Thanks for the review.

I've not been able to review other projects recently as I've been working hard on testing this module. It's been used to pull in 10GB dataset files and has been used in a large deployment scenario.

Also, I've been writing up a document called 'Drupal websites development and deployment strategies' which shows how this module can be used as part of large deployments. Once the document is finished it will be published as a white paper (or possibly a book!) and
made available to the Drupal community.

Please note that I've aided the review process by helping to clear up the issue of coding standards testing. I raised the issue at http://groups.drupal.org/node/195303 and the conclusion RE using ventral.org has become part of the instructions on the 'Tips for ensuring a smooth review' page - http://drupal.org/node/1187664 - hopefully that shows I'm keen to help!

I've worked on your module review points. Here's my replies (not in the same order):

2. data_export_import_callback_overview(): all user facing text must run through t() for translation.
DONE.
I've changed the output in that function.

4. dei.drush.inc: make sure to use your full module name as function prefix to avoid name collisions with other modules.
DONE.
I've changed the drush.inc file so that the functions use the full module name as the prefix. The function names are related to the drush.inc file name so I've changed the file name as well.

1.data_export_import.module: don't execute PHP code gloablly here, as this will be called on every single page request. Include your files only in the appropriate hooks/functions.
and
3. data_export_import_menu(): why do you include files in hook_menu()? You don't need them there?

I've designed the module so it can be extended relatively easily by adding new profiles. If someone adds a new profile file into includes/profiles then it will create a new tab on the data_export_import interface and allow the user to export and import data.

Currently, the module enables user, taxonomy terms and nodes to be exported and imported. I wanted to allow for expansion by adding new profiles. This means that users can relatively easily add new profiles to handle any weird and wonderful custom data they may need to export/import.

As part of this profile plugin setup the code in data_export_import.module loops through all the files it finds in the includes/profiles directory and adds in the callback functions and the extra menu items (tabs). That is what the code is doing.

I don't really want to remove the ability to easily add new profiles - and can't see how it can be implemented differently. When the admin page is requested it needs to read the profiles directory to see what tabs to load and to have the callback functions available.

Thanks again for the review - my current employer has invested heavily in this module and it would be great if we can get it approved as a full Drupal module.

nobody click here