I have a site using apachesolr module to index content into Solr.
It's important for me, that my content goes into solr right away, so I have `drush solr-index` running every 2 minutes via cron. I will eventually switch this to real-time inclusion, but I have not tested that code enough yet to push it live.
With that said, I believe apachesolr module makes heavy use of variable_set() to keep track of when last piece of content was indexed.
The problem with using variable_set() is there's major performance issues with it and is best avoided in any code, but admin code. variable_get() is free, but variable_set() creates a semaphore, and requires the entire variable cache to get rebuilt. During this time all variable_get()'s hang.
This is because of the variable_initialize() function which must get called after a variable_set(). http://api.drupal.org/api/drupal/includes!bootstrap.inc/function/variabl...
So, with this all said. Would it be possible for us to use a table or something else to keep track of this stuff, and avoid the use of variable_set in anything which doesn't related to admins performing actions in the backend.
Related Issue
#561990: Avoid variable_set() and variable_del() stampedes
Comments
Comment #1
Nick_vhI did some research and found that we are using the following variables during index time
variable_set('apachesolr_index_last', $stored);
variable_set('apachesolr_index_updated', $updated);
On Cron
variable_set('apachesolr_last_optimize', $time);
How badly does this impact the index process? And if you want, feel free to make a patch to make this better but also show us the impact this has.
Comment #2
Nick_vhComment #3
j0rd CreditAttribution: j0rd commentedIt's not that it messes up the index process. That works fine.
The problem is, while it's indexing....all variable_get()'s hang while the lock is on.
This means that normal traffic hangs and starts to stack up waiting for locks to be given back. If you run indexing every couple minutes, you'll notice.
Only reason I noticed actually, was because of an unrelated issue and I went into MySQL Slow Query log and noticed a lot of slow "commits". Now I just assumed this was because of variable_initialize(), but today I looked into it and it confirmed my suspicions.
Here's what happening.
variable_set() gets called. variable_set() uses db_merge() function and execute. In MergeQuery class from include/database/query.inc you'll notice that in MergeQuery->execute() it performs a database transaction and commit()
The COMMITs is what I notice in my mysql-slow.log. Notice how they all run around */2 minute cron.
Keep your eyes on the Query_Time. 2.15 is 2.15 seconds.
I also notice a lot of these, which I believe are related to the variables needing to get initialized, which are a lot more costly than the above problems.
So with that all said, if we could avoid variable_set() I believe we should.
As for patches, if I have time I can write some, but this isn't my first priority at the moment. I figured I'd post the issue to get some other people to understand the issue for now.
Comment #4
mkalkbrennerInteresting. Which engine are you using for your tables?
Comment #5
mkalkbrennerI don't think that the transaction itself is the problem. It's the fact that variable_set invalidates the variable cache for bootstrap:
The next request rebuilds the cache_bootstrap and all further requests have to wait until it's finished
I consider that a major bug in apachesolr.
But the consequences are bigger. Every module must avoid variable_set in implementations of hook_cron or any hook that is not related to admin settings ... scary!
Comment #6
pwolanin CreditAttribution: pwolanin commentedThis is a good point, just under our radar.
Since we have environment variables now, we probably can and should use those instead, and that will stay away from affecting the core cache
Comment #7
Nick_vhWhat about we move this to environment variables?
apachesolr_environment_variable_set / get
Comment #8
pwolanin CreditAttribution: pwolanin commentedSadly, core itself uses a variable_set I think to track cron, but agreed we could remove all the ones form this module.
Comment #9
mkalkbrennersee #1955746: cron blocks page requests
Comment #10
j0rd CreditAttribution: j0rd commented@mkalkbrenner I agree that's the main problem here. With the variable's needing to get re-initialized and blocking.
I'm using MySQL/InnoDB. I actually have memcached enabled, but I disabled it when using `drush` as it causes some problems. I could probably enable it (and should) for memcache, but I haven't had the time to look into that problem yet.
@pwolanin I'm not running `drush cron`, I'm running `drush solr-index` every 2 minutes. So I don't really ever need to run `drush cron`. IMHO hook_cron in it's current "DO ALL THE THINGS" mode, is horrible and I write my own single use drush scripts to call particular stuff if I need it and run them at what I consider acceptable intervals individually.
So if I can remove variable_set from `drush solr-index` I'm ok. That's what I'm running every 2 minutes via cron.
Comment #11
mkalkbrennerOff topic but maybe useful: that's why I created Cron Control
Comment #12
j0rd CreditAttribution: j0rd commentedWorks for me. We can continue with the exact same behavior, but we need to avoid cache_set as it requires the variables to be re-initialized, which is what's causing problems.
Comment #13
mkalkbrennerhere's a patch
Comment #15
mkalkbrennerComment #17
mkalkbrenner#15 was not the one I wanted to upload. It's an older one.
I created a new one based on #13 but discovered a different issue with it. I will upload a fixed version soon ...
Comment #18
mkalkbrennerComment #19
mkalkbrennerComment #20
j0rd CreditAttribution: j0rd commentedI was unable to apply your patch. Most likely because I'm not using -dev and I've got a bunch of other patches in my solr install.
The only hunk which didn't apply was in apachesolr_cron(). Here's my "diff" for that function, if anyone else needs it.
Going to test on my development box today, and if that works, I'll push this to production on next release.
Comment #21
Nick_vhComment #22
pwolanin CreditAttribution: pwolanin commentedI wouldn't delete the variables in the update hook, in case something goes wrong?
Comment #23
Nick_vhCommitted with those changes
Comment #24
tinflute CreditAttribution: tinflute commentedThank you for this great work #19.
@mkalkbrenner Can you give any notes of how this patch relates to the 6.x branch?
Is it being ported?
Are there any important differences in version that would affect the porting?
I'm going to try to apply the changes 'manually' and will report back.
Comment #25
pwolanin CreditAttribution: pwolanin commentedThe db_query() calls will need to be backported, but looks like it should be easy to port. Patch will not apply that hunk due to the difference in the DB calls.
Comment #26
mkalkbrennerComment #27
mkalkbrenner