node_import & performance/speed
zeezhao - June 14, 2008 - 19:02
| Project: | Node import |
| Version: | 5.x-1.6 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
I am using node_import to load about 1.3 million nodes from a CSV file. It works fine, but I can only manage to get it load about 20,000 nodes per hour. Hence to complete over 1 million rows will take about 50hrs...
Please has anyone got any experience in improving its performance? I want to make sure someone is not doing similar work on this before I start delving into the code.
So far, I have tweaked php.ini, my.cnf, and use set_time_limit() in node_import.module to get it to even load such a huge file.
Thanks for your help.

#1
#2
I considering if there's a way to split the large csv into several files and import them using some automated process in parallel, since they have the same structure & setting, reducing the manual interaction in admin
#3
Hi. I have not had the time to look into this further, but here are some things I planned:
1. Look for ways to improve performance in node_import.module e.g.
- tune the database queries?
- maybe load the file into a temporary database table, and manipulate some of the information in sql and do bulk updates?
- improve _node_import_csv_get_row()?
- cache any information that gets read multiple times?
2. Have some recovery/checkpointing logic in the code, so that if the process is killed and restarted, it will start loading from where it last read.
Your idea of splitting the files is good. I think you will also have to do things like partitioning the tables (or use innodb in mysql) in order to avoid bottlenecks when the multiple processes are running in parallel against the same tables.