Performance Improvement Suggestion

socki - October 2, 2007 - 20:10
Project:Voting API
Version:5.x-1.5
Component:Code
Category:feature request
Priority:normal
Assigned:socki
Status:won't fix
Description

There seems to be a potential performance issue on heavily trafficed sites that is induced by the tallying of results. A function that gets called during this process is _votingapi_distinct_values(..), which could be greatly improved with the addition of indexes. For example:

SELECT distinct value_type FROM votingapi_vote WHERE content_type = 'node'

and

SELECT distinct tag FROM votingapi_vote WHERE content_type = 'node'

Both of these queries, when it was run on the lifetimetv.com project resulted in mysql creating a temporary table of over 60K rows so that it could return one row. The time that it took the system to do this process (occasionally over 60 seconds) actually was a major contributor to a dramatic spike in the number database connections. This seems can definitely be improved with the addition of indexes. For example:

ALTER TABLE votingapi_vote
ADD INDEX content_type_value_type ( content_type , value_type )

ALTER TABLE votingapi_vote
ADD INDEX content_type_tag ( content_type , tag )

Since the indexes have been added, performance has improved dramatically. Could this be something that we maybe make part of the install so that it doesn't fly under the radar and get missed? If not, this seems something to keep in mind on heavily busy sites.

#1

m3avrck - October 5, 2007 - 00:04

Great call! We just added these indexes to our site...

#2

kbahey - November 10, 2007 - 17:28

Seconded ...

I saw two large clients suffering from temporary tables, and these indexes solved the problem.

#3

cirotix - May 20, 2008 - 15:25

While the above indexes help a lot, the solution is still not satisfactory for site with :
- high-traffic
- lot of votes
- free taxonomy

My setup. I run a pretty high-traffic website (www.rue89.com, ~12 000 000 pages a month).
We have votes on node and on comments so that votingapi_votes have 1 500 000 rows and votingapi_cache 500 000. The index on votingapi_vote is quit big (~35Mb). We do have freetagging on nodes, new tags are regularly added/removed/modified. We use memcache as a cache backend but the problem is the same with core drupal cache so I will use the cache table names here (such as cache_views)

How does all of this fit together :

- The cache_view table is invalidated everytime a new term is added/removed/modified
From views.module

<?php
/**
* Invalidate the views cache when taxonomy vocabulary changes.
*/
function views_taxonomy($op, $type, $object = NULL) {
  if (
$type == 'vocabulary' && $op == 'delete' || $op == 'insert' || $op == 'update') {
   
views_invalidate_cache();
  }
}
?>

- The function _votingapi_views_values($field = 'tag', $table = 'vote') get called everytime the cache_views table needs to be repopulated. It is called by the hook votingapi_views_tables.
This is this function that call
SELECT distinct value_type FROM votingapi_vote WHERE content_type = 'node'
and
SELECT distinct tag FROM votingapi_vote WHERE content_type = 'node'

With our configuration it can litteraly kill the sql server if the terms are regullarly changed. For the little history, we have someone here for a short time mission on our tags organisation. She is doing a *lot* of tag merging, updating and delating.

The ironical part is that those query generally returns only a single values (votes for tag, percent for value_type). There is also a DISTINCT function FROM votingapi_vote WHERE content_type = 'node' that return an array with a couple of cell. And all that is being calculated again and again every time the cache is calculated.

As a quick fix I return the desired array with hard-coded values for ex :

<?php
 
if ($field == 'tag') {
    return array(
'vote' => 'vote');
  }
  if (
$field == 'value_type') {
    return array(
'percent' => 'percent');
  }
  if (
$field == 'function' && $table = 'cache') {
    return array(
     
'count' => 'count',
     
'average' => 'average',
    );
  }
?>

This is not very nice, but it has provided a dramatic improvement in our performance (see the attached munin graph).

As a nicer fix, the 'value_type', 'tags' and 'function' distinct values should not be calculated like this, and should not be dependent of cache_views flushing.

Actually the distinct values of 'value_types' and 'tags' should be calculated everytime modules using the votingapi define a new vonting api "widget". For most of the cases I guess that this could be done in the hook_install in the .install files as the possible 'values_types' and 'tags' are hard-coded. It could be that other cases let the user define some custom 'tags' at a configuration level. As an exemple advpoll let you do that (but this is a theoricall problem as the content_type in advpoll entries is 'advpoll' and thus is not returned by the advpoll as which let the user define custome values).
The values could be stored in the 'variable' table.

I will try to see what are the possible solutions and post my findings here. Any inputs are welcome

AttachmentSize
sql1.rue89.typhon.net-load-day.png 24.05 KB

#4

jpd972 - June 3, 2008 - 22:41

The patch below is a suggestion to improve votingapi performance with MySQL in the short term:
- better indexing
- one query rewritten for performance

In the longer term, this is a place that would benefit from a better table design, and really aggressive caching.

AttachmentSize
votingapi_20080603.patch 1.97 KB

#5

eggthing - June 25, 2008 - 08:43

So has this code been implemented in 5.x-1.6?

#6

moshe weitzman - October 7, 2008 - 13:57

FYI, this query is specific to D5. D6 does not suffer from this.

#7

wwwoliondorcom - March 18, 2009 - 02:59

Hi,

I also have problems with Dreamhost for slow queries with Voting Api, do you think that this patch will help ?

Thanks a lot !

#8

eaton - June 24, 2009 - 15:56
Status:needs review» won't fix

Views integration in VotingAPI 5.x is fundamentally broken -- it works for a small number of cases but more complex cases like this are not supported, and this isn't being developed further. Views support was rewriten in the 2.x branch to fix these issues. If anyone wants to take a stab at rewriting the Views integration for D5, they're welcome to, but it's not on my radar at present.

 
 

Drupal is a registered trademark of Dries Buytaert.