Please, help me! My project consists in indexing millions o nodes (50 million), but its nodes without body, just the titles and the taxonomy facets (about 6 facets groups). The problem is that the module only sends about 100 nodes per second. I'm running a vanilla drupal installation, with almost no modules (ONLY the modules required by apachesolr)

I'm pretty sure this is a problem of how the drupal api handles sending nodes, not apachesolr itself.

PLEASE, the data per node is very very small. Just my node count is very very big. Please, I need to send to the indexer, at lease, 10.000 to 20.000 PER SECOND; (please, notice that I'm not asking to increase the apache solr indexer performance. The problem is BEFORE the indexing happens. It is how the drupak api gathers data from mysql, i think.

PLEASE help me!!
Thanks!!

Comments

jonnyp’s picture

The UI will limit you to 200 nodes per batch but you can use php to force a higher limit, such as

variable_set("apachesolr_cron_limit",2500);

It might be wise to start in the low thousands and work your way up as you may well have out of memory errors if you set it too high.

Operations-1’s picture

Thank you very much for your tip, but it seems not to increase performance. I tested with 200 nodes per cron and with 2000 nodes per cron. It takes the exact same time. Notice that I'm runnin it on a rackspace server in the cloud with 16GB of RAM. So, hardware is not the problem.

PLEASE, is there any other tips you can give me?? I'm desperate.

pwolanin’s picture

You need to investigate using the Solr data import handler, or perhaps the CSV update handler. In either case you'll have to do some custom work to get the indexed content to match what the module sends, but doing all those node_load() calls in Drupal which always be relatively slow.

An alternative would be to write some custom indexing code in Drupal that uses direct SQL queries to more simply build the documents to index. If you use that plus the CSV update handler you could avoid the overhead of building the XML documents.

You need to look at the Solr docs and probably join the Solr mailing list to get more help.

jpmckinney’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.