Selecting all data causes problems
formicin - January 25, 2007 - 14:25
| Project: | Zeitgeist |
| Version: | HEAD |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
I get a lot of search everyday, so "We query an unlimited range because of potential duplicates" thing slows my web server. How about just selecting last 1000 queries and have a little chance of getting duplicates?
How should I edit the module and database to achieve this?
Note: "LIMIT 0 , 1000" doesn't work b/c it says the database to first order them by ts and select latest 1000. This ordering process causes the server to slow.

#1
The basic answer would be to reduce the number of days in history: after that the DB will automatically be truncated on the daily truncature event.
You can also truncate it by hand to remove excess events if you don't/can't want to wait for the next day, by deleting events before a certain timestamp.
Note that if your DB has an index on TS, the ordering should not cause any significant slowdown: although the query is over the whole table, the fetch stops when enough results have been gathered, and even does a mysql_free_result to release the table as soon as possible. Modifying the query to go like:
<?php$sq = SELECT DISTINCT zg.search, zg.category FROM ' . ZGTABLE . ' zg ORDER BY zg.ts DESC ;
$q = db_query_range($sq, 0, $count);
?>
might be more efficient, but it would need to be benchmarked to be sure: DISTINCT queries are usually more of a resource hog than these type of loops, because they mobilize the whole table instead of fetching results by chunks following an index. If you can benchmark this and other variants and can come up with a patch, I'd certainly be interested in it.
[downgrading priority: it does not prevent the module from working, or cause a security violation]
[changing type: this is specific to the requesting site, not an code error]
#2
I tried DISTINCT before, it didn't help. Just like you said, it even slowed down the query.
I think there should be an option like 12 hours / 6 hours / 3 hours for people having a lot of searches every single day.
Can you update it?
#3
Thinking harder on your issue, I'm not sure why you are using ZG and see it as a hog, causing you to want not to keep a long history : do you display the "Latest searches" block for all users ? If so, this is an invitation to spamming your site just to appear in the search results. If this is not so, that code is not invoked for all pages and hence does not cause a performance problem: it only slows when you are watching it.
If you just want to check the stats for yourself, as the site admin, I'd rather recommend that you create a custom, unpublished, PHP page on your site, and gather stats from the
_zeitgeist_statsfunction which has been created just for that purpose, and go to that page to see the results full-page, in the form of your choice. It could go like this, for instance:<?php
/**
* Sample code to display a one-week (starting on monday)
* history of node searches.
* Check out the ZG* constants in zeitgeist.module for other
* predefined time spans.
*/
$zg_stats = _zeitgeist_stats(ZGSPANWEEKM, NULL, 'node');
$zg_stats = $zg_stats->scores;
$searches = array();
foreach($zg_stats as $zg_stat)
{
$search = empty($zg_stat['search']) ? '(empty search)' : $zg_stat['search'];
$searches[] = "$search: ". $zg_stat['count'];
}
$ret = theme('item_list', $searches);
echo $ret;
?>
#4
Just a side note: the offload module should be helpful under site situation like yours, by allowing stats offload and truncation.
#5
No news since january, closing. Feel free to reactivate as needed.
#6
#7
I get used to clearing the zeitgeist table weekly but sometimes I forget to do it. So, this table coming big and the query becames in the top ten slow-queries log.
Just want to tell you my workaround for this problem that could be added to the module as a parameter.
I just added current timestamp to limit last 20 minutes records with the addition of the third line:
$sq = 'SELECT zg.search, zg.category '. 'FROM ' . ZGTABLE . ' zg '
. 'WHERE zg.ts > (UNIX_TIMESTAMP() - 720) '
. 'ORDER BY zg.ts DESC' ;
This limits the amount of records returned drastically as I've verified with the EXPLAIN SELECT zg.search, zg.category FROM zeitgeist zg WHERE zg.ts > (UNIX_TIMESTAMP() - 720) ORDER BY zg.ts DESC
Obviuoslly, the 720 is what could be added as a parameter depending on the average searches rate per hour. This make the module run very fast
Thanks
Gustavo
www.inforo.com.ar
#8
No active developement on 5.x. None planned for 6.x: new features are for 7.x.