This is my frist time using Feeds. Importing about 30,000 nodes with perhaps 10 CCK Fields per node.

It is working, but starts faster and then slows down. The first couple minutes will import about 6 per minute, then slows to 1 per minute and eventually stops.

Wondering if there are any easy performance tweaks. I am on a vps so can make changes. I set PHP to CGI because I have heard that is faster than SUPHP.

Attached is the PHP Configuration Editor through WHM. Set the PHP Memory limit to 300 and Execution Time and Input Time to 0.

Any advice? Or is this normal? I am not an advanced user and most of the comments in the issue queue are beyond my understanding. Thank you.

CommentFileSizeAuthor
PHP Configuration Editor.jpg235.21 KBitserich

Comments

itserich’s picture

Title: new user - starts faster than slows down - first minute 6 nodes, then 1 per minute, then stops » new user - starts faster than slows down - first minute 6 nodes, then 1 per minute, then stops - HTTP error 404

Will add, prior to timing out, I always get this error:

An error has occurred.
Please continue to the error page

An HTTP error 404 occurred. /batch?id=126&op=do

Processing continues for some time after that. I see a 404 error only once in the issues so perhapst that is a rare problem.

itserich’s picture

Title: new user - starts faster than slows down - first minute 6 nodes, then 1 per minute, then stops - HTTP error 404 » configuration tips for new users - what worked for me
Status: Active » Closed (works as designed)

So it went from as described in the original comment to being able to update 50 nodes per minute consistently. Still not sure how Feeds works but here is what worked for me.

First, to create new nodes of a specific type, I think you must create a custom feed at Site Building -> Feed Importers -> New importer. When I used the standard Node import it always created a node Story type. If I did not have a content type Story I would not even realize the module was working so I am grateful I still have that content type because it caused me to continue.

Basic Settings - Use Standalone Form (not sure how Attach form works), Min refresh period (don't know if that applies to one time node creation) and Import on Submission checked Yes

Fetcher - I uploaded a csv file to my web site directory so used HTTP Fetcher. Did use both Open Office and Excel and noticed no difference.

Parser - CSV parser, default delimiter is the comma ,

Node Processor - Choose the content type to be created, I chose Filtered HTML, node never expires. Don't know how update existing node works.

Mapping - Source is the header row in the CSV file and Target is the node's CCK field name

GUID should be a unique name which is always good when importing data. I created a list number in the spreadsheet and used that.

I have a VPS so am able to choose a PHP through WHM. I tried suPHP, DSO and CGI. Don't know the difference but read there is a performance difference and ended up with CGI.

For PHP configuration, I ended up with Core Memory limit 1000 MB and unlimited - 0 - execution time and input time.

I do not know if this is optimal but it worked for me.

At first, I would start the upload, the program would seem to stall, I would get an error on the import page, and I would delete the file, create a new one, and restart the import. This lead to some duplicates.

Finally, I think I understood the program first imports all the data into its memory, and then creates the nodes. So, even if the csv file is deleted, the old file can still be in memory, causing duplicate nodes. At least, I think this is what happened.

Also, it created nodes in 50 record batches. Then it would stop and wait for cron to impor the next 50. I finally set cron at 1 minute and it seemed to work, creating 50 nodes per minute.

So, one thing I learned, if the process starts, and then there is an error on the page import page and the process seems to stall, wait a couple cron runs to see if it is still processing.

Also, prior to Feeds I had tried Node Import and Migrate and Node Import appears largely inactive / unsupported and Migrate looks great but requires the creation of a module to use it, I think. The reason I waited on Feeds though I had heard about it is I was confused by its RSS like moniker, but it works great on importing nodes.

Good luck and my impression of how Feeds works may be wrong.

itserich’s picture

Another tip:

If there is a desire to stop an import and restart it, it seems the import may continue where the last attempt stopped.

This means, if the import was stopped at record 20, the next attempt may start at 21.

By cloning the feed and giving it a new name, the problem appears to be avoided.

kclarkson’s picture

@isterich

Thanks for the quick reply. I got everything to import except for the dates. For your import did you have Date Fields that needed to be imported? Not the posted date but date fields for start and end time. And I am using the "dates" module so they are not just plain text cck fields but Date cck fields. It appears as though everything imports except for the Dates and one set of Taxonomies.

Thanks,

itserich’s picture

No, I did not have any dates to import.

Have you ever used the Rules module, you might be able to import the data as a regular cck text field and use Rules to populate the date field as the nodes are created.

Or there might be an add on module to Feeds.

itserich’s picture

Another tip:

For one time import of nodes, set: Min refresh period to never

Or I think t will try to import more in the future. May be why I had duplicates. Now that it is set to never it appears to have no more duplicates upon creation.

itserich’s picture

Another tip:

To avoid duplicates at node creation.

I think set Min refresh period to Never.

kclarkson’s picture

Just wanted to give everyone a heads up that I got the date fields to import.

I am using excel to enter text from an old static HTML page. For the date fields you need to use the custom format cells prior to converting it to a csv file.

k3n3dy’s picture

@kclarkson
What date format you used to import it successfully?

I'm having an issues with date fields. It seems Feeds needs the dates to have the timezone information. For example, if I import a field with date "2011-02-17 05:00" it stores it as "2011-02-17 03:00" (two hours back, which is my timezone GMT -02).

Thanks.

I think I found what was looking for in this issue http://drupal.org/node/722740

kclarkson’s picture

I used microsoft excel and changed the date fields to "custom". Saved and then converted to csv.

k3n3dy’s picture

Thanks for that. I believe my problem is the same as http://drupal.org/node/722740
Feeds Date mapper converts imported dates to GMT unless in UNIX timestamp format

Probably the custom format used by Excel fixes this.

Thanks.

ron_sparks’s picture

Thanks itserich for posting. I am going to be working on this today, as node import seems to be a road block, thanks for posting this!

k3n3dy’s picture

Is it possible to use Feeds to update previously imported nodes? I can't find any information about this.

I am able to import nodes. Later in time I need to update some CCK field in this nodes. I tried a second importer but it created all nodes again with empty fields, except for the ones I had filled in the csv file. I used the same GUID for the records in both files, so it should be able to identify what node the update was referring.

Any point on the direction to find a way to do this is very welcome.

k3n3dy’s picture

Anyone? No one?
I found some scripts that can be converted in a module to do what I described above, but I think it would be logical to have this functionality in Feeds.

itserich’s picture

I have never tried this module except to import new data. I think it can be used to regularly check a data source but as far as updating nodes that is outside my ability. There seems to be a lot of activity with the module.

kapayne’s picture

I had great success updating previously imported nodesusing feeds node multisource -http://drupal.org/project/feeds_node_multisource. I created one feed to import data to create nodes (the parent), then another feed - the child feed - to update those nodes (based on the guid)