Data transformation -- can we expose $data to a hook?
kratib - February 28, 2009 - 05:30
| Project: | Node import |
| Version: | 6.x-1.0-rc4 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
Thanks for an excellent module. Does it support data transformation, or do you plan to support it in the future?
Data transformation simply means the ability to modify the column values of each row, based on some client-supplied transformation function that gets called for each column value. This gives the opportunity to insert business logic during the import process. For example, when importing products with categories, we need to be able to map the input category terms (coming from the external system) to the terms of the Drupal taxonomy.

#1
Well, there is a hook_node_import_values_alter() and a hook_node_import_postprocess(). You could insert your logic there.
See http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/node_import... - there is a doxygen.txt file that could help you convert it into something more readable.
I must agree that the documentation is not complete yet, but if you have questions you know how to find me...
Or did you have some other idea on how this client-supplied transformation function would be supplied?
#2
Thanks for your reply. I will try to illustrate my point further to show that these hooks are not ideal (at least in their current form) to support generic business logic.
Let's say I am importing books from two different sources. The category system of each source is different from the other, and different from the Drupal taxonomy for content type Book. I therefore need two functions to map Source 1 category => Drupal, Source 2 category => Drupal. Where to place those functions?
Ideally, I would like to tell node_import the following:
I will be performing 2 *types* of import on this site. For each type, here is the content type to be created (Book), the fields to map from the source, and the transformation to apply on the category values that you read from the source before you populate the taxonomy field.
With the hook system you have in place today, as far as I can understand of course, we can only achieve to describe the content type and the field mappings, but neither the definition of an import *type*, nor the type-specific transformations that should occur.
The simplest addition I can think of for node_import is to allow naming a "new import". This name can be used as the type of import. If this name is passed to your hooks, then someone could write a business-specific hook that switches on the type of import and the field to be processed.
The only question that remains is which hook should be used? I had already read the hook_node_import_docs.php file although I must admit that a visual diagram of the calling sequence would be much easier to follow :-)
Thanks for reading, hope that makes sense, and I'd be happy to contribute coding and/or testing to encourage that great module.
#3
I forgot to mention that the usage scenario I have in mind calls for repeatedly importing from those 2 sources, not just a one-time import for each. That means the import type definition should also be persisted.
#4
Firstly: you want to save a import type definition so you can reuse it.
One of the planned features was to to be able to save a node import task. This means that you would go through the wizard and do the mapping, set defaults, etc but instead (or in addition) to doing the import with the supplied file, the node import task is also saved as a "profile". Profiles would use the "name" property of the task more intensively (eg to reference it).
Next time a user wants to do an import with the same parameters, he would be able to select an already created profile (or create a new one which would go through the wizard again like now).
This has not yet been implemented, but maybe I should move it up on the todo list. The task is already saved but only to do the import, not for use (as a profile) for a new import. So this would involve changing some steps on the form but is not so big a job.
Secondly, you want to attach business logic.
This means (in your example): category of source 1 → drupal taxonomy term. Without you probably realising (or more likely, not really documented as it should) node_import already does some data transformation. For example, in your CSV file you can specify "webmaster" as "authored by" and node_import will convert this to a uid if needed. This is just the same kind of data transformation you would want, except you have a more special case.
The data transformation of a taxonomy term to a taxonomy tid is similar, and probably the thing you would want to extend. At present you would need to write a module does almost the same as
node_import_check_taxonomy_term(), but then for your category of source 1 → drupal taxonomy term → drupal taxonomy tid.Something like (untested):
<?php
/**
* Implementation of hook_node_import_fields_alter().
*/
function yourmodule_node_import_fields_alter(&$fields, $type) {
// Check if the type is a node content type.
if (($node_type = node_import_type_is_node($type)) !== FALSE) {
// Now you would need to alter the field. Let's say the Drupal vocabulary has
// ID 42. What you want to do is add a new 'preprocess' function to the
// array of already present 'preprocess' functions. This function would
// convert the input value from the CSV to a tid that taxonomy can use.
if (isset($fields['taxonomy:42']) {
$fields['taxonomy:42']['preprocess'] = array_unshift('yourmodule_business_logic', (array) $fields['taxonomy:42']['preprocess']);
}
// If you would want to add the same business logic preprocess function
// to all taxonomy forms, you could do a:
// foreach ($fields as $fieldname => $fieldinfo) {
// if ($fieldinfo['input_format'] = 'taxonomy_term') {
// // same thing
// }
// }
// Note that yourmodule would need to run after the taxonomy
// module currently. That's because supported/taxonomy.inc
// does more or less the same thing in taxonomy_node_import_fields_alter().
// I'll change that because it can do it just as well in
// taxonomy_node_import_fields().
}
}
// What you have done now is told node_import that before submitting
// the value from the CSV file, you would first like to run a
// "yourmodule_business_logic" function on the value.
// See from the hook_node_import_docs.php:
/**
* ...
* - \b "preprocess" : Array of callback functions. Each of the functions
* can preprocess the mapped value. This is a way to validate and
* preprocess the input. These preprocess functions can
* be used in companion with the hook_node_import_values_alter().
*
* The signature of the callback is:
* @code
* $return = $function(&$value, $field, $options, $preview);
* @endcode
* and it should alter the $value passed. Note that if the field
* "has_hierachy", the value passed will be an array (grandparent,
* parent, child).
*
* The $return value should be FALSE if there is an input error,
* NULL if there was not, but not a valid value could be found or
* TRUE if a valid value could be found and so other preprocess
* functions can be skipped.
*
* See @ref node_import_preprocess for examples.
*
* Defaults to array().
*/
/**
* A preprocess function that converts a value to the correct
* taxonomy term.
*/
function yourmodule_business_logic(&$value, $field, $options, $preview) {
// The function is given the $value (from CSV file or from a previous
// executed preprocess function) for the given $field (as returned by
// node_import_fields()). It also passes some options for the field if
// you have implemented hook_node_import_options().
// The $field includes $fields['vocabulary'] which is the vocabulary
// object by the way.
// Now you need to lookup the $value in your "category source 1"
// and convert it to a "drupal taxonomy term tid".
$value = yourmodule_lookup_source1($value);
// You could give some errors if the value does not appear to be
// valid in source 1. Look at some the other examples such as
// node_import_check_user_reference, etc (all *_check_* functions).
// If you have succesfully looked up the value you can return
// TRUE here. If you still want the builtin preprocess functions
// to run (which convert a term name to a term tid) you would
// return NULL. If there was an error, you would return FALSE.
return TRUE;
}
?>
This is a pretty small module to implement. OK, it is not totally clear right now how to use the hooks together... I'll need to write some clearer "overview" docs (maybe even including this example).
The question is now... how can I allow users filling in the wizard to submit "business logic" on the fly. They probably will have to specify it in PHP. Need to think about it, but it is more longterm for sure. Maybe it should be possible to have modules (or users) to define extra preprocess functions and let the user choose from them in the wizard.
If you still have questions, or it is not as clear... let me know.
[Edit: array_shift() needed to be array_unshift() in the example]
#5
Thanks for the thoughtful reply. One small addition that would be necessary to the hook code above is to include the profile object as an argument to hook_node_import_fields_alter(), so that I can choose the correct preprocessor based on which profile is active. Otherwise I still cannot distinguish between Source 1 and Source 2.
Concerning the addition of the business logic on the fly, I suggest a new hook whereby each business logic module returns its own preprocessors, so I can choose between them in the UI.
#6
I too would be interested in data transformation features. My needs are a little simpler, though. I just want to map values on a 1-1 basis.
For example, if I have a column in the .csv file that stores "Y" and "N", and a field in a Drupal content type that allows "Yes" and "No", then I'd like to be able to tell the import process that Y means Yes and N means No.
Similarly, if I have column in the .csv file that holds a pipe-delimited list of 3-digit codes, I'd like to be able to map those to more verbose taxonomy terms.
Business rules would be great as well, but even a simple 1-1 conversion would save a lot of SQL/scripting gymnastics on the export side.
Thanks for the great module!
#7
I'm creating a php transformation module (or could be a patch to this module) for node_import which adds a computed php field to the "map" screen for each row. The idea is to allow an admin user to specify transformations on the csv data right from the ui.
I've achieved basic transformations where there is a 1:1 mapping of the CSV row to a field value (such as, say, performing a ucfirst() on the "Title" field) by adding the php string to be executed to the $options array, and then performing the transformation using hook_node_import_values_alter().
What doesn't seem to be possible with the current hooks is to reference other fields in the CSV data. (A basic use case would be to concatenate two csv columns into the Title field). Is there some way we can expose the CSV data that's in the $data var in node_import_values() for reference in a hook?
I've gotten it working by changing the following in node_import.inc:
drupal_alter('node_import_values', $values, $type, $defaults, $options, $fields, $preview);... to ....
drupal_alter('node_import_values', $values, $type, $defaults, $options, $fields, $preview, $data, $map);Is there another way to do this without having to hack the module? Or - should I submit this as a patch?
Thank you for your help and for the awesome module.