Closed (works as designed)
Project:
Feeds
Version:
6.x-1.0-beta9
Component:
User interface
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
6 Dec 2010 at 00:49 UTC
Updated:
22 Apr 2011 at 16:22 UTC
This is my frist time using Feeds. Importing about 30,000 nodes with perhaps 10 CCK Fields per node.
It is working, but starts faster and then slows down. The first couple minutes will import about 6 per minute, then slows to 1 per minute and eventually stops.
Wondering if there are any easy performance tweaks. I am on a vps so can make changes. I set PHP to CGI because I have heard that is faster than SUPHP.
Attached is the PHP Configuration Editor through WHM. Set the PHP Memory limit to 300 and Execution Time and Input Time to 0.
Any advice? Or is this normal? I am not an advanced user and most of the comments in the issue queue are beyond my understanding. Thank you.
| Comment | File | Size | Author |
|---|---|---|---|
| PHP Configuration Editor.jpg | 235.21 KB | itserich |
Comments
Comment #1
itserich commentedWill add, prior to timing out, I always get this error:
An error has occurred.
Please continue to the error page
An HTTP error 404 occurred. /batch?id=126&op=do
Processing continues for some time after that. I see a 404 error only once in the issues so perhapst that is a rare problem.
Comment #2
itserich commentedSo it went from as described in the original comment to being able to update 50 nodes per minute consistently. Still not sure how Feeds works but here is what worked for me.
First, to create new nodes of a specific type, I think you must create a custom feed at Site Building -> Feed Importers -> New importer. When I used the standard Node import it always created a node Story type. If I did not have a content type Story I would not even realize the module was working so I am grateful I still have that content type because it caused me to continue.
Basic Settings - Use Standalone Form (not sure how Attach form works), Min refresh period (don't know if that applies to one time node creation) and Import on Submission checked Yes
Fetcher - I uploaded a csv file to my web site directory so used HTTP Fetcher. Did use both Open Office and Excel and noticed no difference.
Parser - CSV parser, default delimiter is the comma ,
Node Processor - Choose the content type to be created, I chose Filtered HTML, node never expires. Don't know how update existing node works.
Mapping - Source is the header row in the CSV file and Target is the node's CCK field name
GUID should be a unique name which is always good when importing data. I created a list number in the spreadsheet and used that.
I have a VPS so am able to choose a PHP through WHM. I tried suPHP, DSO and CGI. Don't know the difference but read there is a performance difference and ended up with CGI.
For PHP configuration, I ended up with Core Memory limit 1000 MB and unlimited - 0 - execution time and input time.
I do not know if this is optimal but it worked for me.
At first, I would start the upload, the program would seem to stall, I would get an error on the import page, and I would delete the file, create a new one, and restart the import. This lead to some duplicates.
Finally, I think I understood the program first imports all the data into its memory, and then creates the nodes. So, even if the csv file is deleted, the old file can still be in memory, causing duplicate nodes. At least, I think this is what happened.
Also, it created nodes in 50 record batches. Then it would stop and wait for cron to impor the next 50. I finally set cron at 1 minute and it seemed to work, creating 50 nodes per minute.
So, one thing I learned, if the process starts, and then there is an error on the page import page and the process seems to stall, wait a couple cron runs to see if it is still processing.
Also, prior to Feeds I had tried Node Import and Migrate and Node Import appears largely inactive / unsupported and Migrate looks great but requires the creation of a module to use it, I think. The reason I waited on Feeds though I had heard about it is I was confused by its RSS like moniker, but it works great on importing nodes.
Good luck and my impression of how Feeds works may be wrong.
Comment #3
itserich commentedAnother tip:
If there is a desire to stop an import and restart it, it seems the import may continue where the last attempt stopped.
This means, if the import was stopped at record 20, the next attempt may start at 21.
By cloning the feed and giving it a new name, the problem appears to be avoided.
Comment #4
kclarkson commented@isterich
Thanks for the quick reply. I got everything to import except for the dates. For your import did you have Date Fields that needed to be imported? Not the posted date but date fields for start and end time. And I am using the "dates" module so they are not just plain text cck fields but Date cck fields. It appears as though everything imports except for the Dates and one set of Taxonomies.
Thanks,
Comment #5
itserich commentedNo, I did not have any dates to import.
Have you ever used the Rules module, you might be able to import the data as a regular cck text field and use Rules to populate the date field as the nodes are created.
Or there might be an add on module to Feeds.
Comment #6
itserich commentedAnother tip:
For one time import of nodes, set: Min refresh period to never
Or I think t will try to import more in the future. May be why I had duplicates. Now that it is set to never it appears to have no more duplicates upon creation.
Comment #7
itserich commentedAnother tip:
To avoid duplicates at node creation.
I think set Min refresh period to Never.
Comment #8
kclarkson commentedJust wanted to give everyone a heads up that I got the date fields to import.
I am using excel to enter text from an old static HTML page. For the date fields you need to use the custom format cells prior to converting it to a csv file.
Comment #9
k3n3dy commented@kclarkson
What date format you used to import it successfully?
I'm having an issues with date fields. It seems Feeds needs the dates to have the timezone information. For example, if I import a field with date "2011-02-17 05:00" it stores it as "2011-02-17 03:00" (two hours back, which is my timezone GMT -02).
Thanks.
I think I found what was looking for in this issue http://drupal.org/node/722740
Comment #10
kclarkson commentedI used microsoft excel and changed the date fields to "custom". Saved and then converted to csv.
Comment #11
k3n3dy commentedThanks for that. I believe my problem is the same as http://drupal.org/node/722740
Feeds Date mapper converts imported dates to GMT unless in UNIX timestamp format
Probably the custom format used by Excel fixes this.
Thanks.
Comment #12
ron_sparks commentedThanks itserich for posting. I am going to be working on this today, as node import seems to be a road block, thanks for posting this!
Comment #13
k3n3dy commentedIs it possible to use Feeds to update previously imported nodes? I can't find any information about this.
I am able to import nodes. Later in time I need to update some CCK field in this nodes. I tried a second importer but it created all nodes again with empty fields, except for the ones I had filled in the csv file. I used the same GUID for the records in both files, so it should be able to identify what node the update was referring.
Any point on the direction to find a way to do this is very welcome.
Comment #14
k3n3dy commentedAnyone? No one?
I found some scripts that can be converted in a module to do what I described above, but I think it would be logical to have this functionality in Feeds.
Comment #15
itserich commentedI have never tried this module except to import new data. I think it can be used to regularly check a data source but as far as updating nodes that is outside my ability. There seems to be a lot of activity with the module.
Comment #16
kapayne commentedI had great success updating previously imported nodesusing feeds node multisource -http://drupal.org/project/feeds_node_multisource. I created one feed to import data to create nodes (the parent), then another feed - the child feed - to update those nodes (based on the guid)