Optimize slow queries

iandit - March 13, 2009 - 03:49
Project:Resource Description Framework (RDF)
Version:6.x-1.0-alpha6
Component:Miscellaneous
Category:feature request
Priority:normal
Assigned:Arto
Status:active
Issue tags:Performance
Description

I've been working on a site and recently I've enable the "slow queries" logs.. I've noticed that two queries is always into them. Is there ways to improve these or to reduce the query time?

#1
--------------------------------
# Time: 090312 20:40:36
# User@Host: mydb @ localhost []
# Query_time: 35 Lock_time: 0 Rows_sent: 0 Rows_examined: 3882650
SELECT COUNT(*) AS dups, gid, sid, pid, oid, tid, lang, data FROM rdf_data_calais_node GROUP BY gid, sid, pid, oid, tid, lang, data HAVING dups > 1;

#2
--------------------------------
# Time: 090312 20:43:24
# User@Host: mydb @ localhost []
# Query_time: 178 Lock_time: 0 Rows_sent: 9624 Rows_examined: 3073818
SELECT DISTINCT g.uri g FROM rdf_data_calais_node d INNER JOIN rdf_resources g ON d.gid = g.rid;

#1

Arto - March 13, 2009 - 03:56
Title:slow queries... any way to improve?» Optimize slow queries
Category:support request» feature request
Assigned to:Anonymous» Arto

Thanks for reporting this.

There are some performance optimizations being done for the next release, and we'll have a look at this as well.

Note that the first query seems to be the duplicate checking functionality that you can configure at admin/settings/rdf. Since it's returning 0 rows, just disable the "Merge duplicate statements on cron runs" setting to eliminate that query.

Do you actually have some 3 million rows in those tables, btw? If so, we'll need to add some indexes.

#2

iandit - March 13, 2009 - 04:27

Thanks for looking into this and giving me few tips. I've just disable the "Merge duplicate statements on cron runs".

Yeah it seems I have that amount of data in the rdf tables. I'm using the open calais modules so my guess is that its coming from there. So adding some indexes would certainly help ease my mind :P

Great job on this module.

#3

Arto - March 13, 2009 - 04:51

OK, thanks for the confirmation - that amount of data is beyond what I've tested the module with so far, hence why these slow queries haven't been noticed previously. We'll definitely try and fix this for the next release.

#4

febbraro - March 13, 2009 - 22:22

I'm curious how many Calais tagged nodes you have. Can you tell me what the following query returns?

select count(distinct(nid)) from calais_term_node;

We'll see what we can do about the performance though. If it gets to be too much of a drain you can always turn off local RDF storing for Calais at admin/settings/calais/calais-node

#5

iandit - March 14, 2009 - 01:44

Running the query, I get 9830. Doesn't seem an exorbitant high number to me... but you'll tell me shortly i guess.

 
 

Drupal is a registered trademark of Dries Buytaert.