Please, help me! My project consists in indexing millions o nodes (50 million), but its nodes without body, just the titles and the taxonomy facets (about 6 facets groups). The problem is that the module only sends about 100 nodes per second. I'm running a vanilla drupal installation, with almost no modules (ONLY the modules required by apachesolr)
I'm pretty sure this is a problem of how the drupal api handles sending nodes, not apachesolr itself.
PLEASE, the data per node is very very small. Just my node count is very very big. Please, I need to send to the indexer, at lease, 10.000 to 20.000 PER SECOND; (please, notice that I'm not asking to increase the apache solr indexer performance. The problem is BEFORE the indexing happens. It is how the drupak api gathers data from mysql, i think.
PLEASE help me!!
Thanks!!
Comments
Comment #1
jonnyp commentedThe UI will limit you to 200 nodes per batch but you can use php to force a higher limit, such as
It might be wise to start in the low thousands and work your way up as you may well have out of memory errors if you set it too high.
Comment #2
Operations-1 commentedThank you very much for your tip, but it seems not to increase performance. I tested with 200 nodes per cron and with 2000 nodes per cron. It takes the exact same time. Notice that I'm runnin it on a rackspace server in the cloud with 16GB of RAM. So, hardware is not the problem.
PLEASE, is there any other tips you can give me?? I'm desperate.
Comment #3
pwolanin commentedYou need to investigate using the Solr data import handler, or perhaps the CSV update handler. In either case you'll have to do some custom work to get the indexed content to match what the module sends, but doing all those node_load() calls in Drupal which always be relatively slow.
An alternative would be to write some custom indexing code in Drupal that uses direct SQL queries to more simply build the documents to index. If you use that plus the CSV update handler you could avoid the overhead of building the XML documents.
You need to look at the Solr docs and probably join the Solr mailing list to get more help.
Comment #4
jpmckinney commentedSee also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed