I've been working on a site and recently I've enable the "slow queries" logs.. I've noticed that two queries is always into them. Is there ways to improve these or to reduce the query time?

#1
--------------------------------
# Time: 090312 20:40:36
# User@Host: mydb @ localhost []
# Query_time: 35 Lock_time: 0 Rows_sent: 0 Rows_examined: 3882650
SELECT COUNT(*) AS dups, gid, sid, pid, oid, tid, lang, data FROM rdf_data_calais_node GROUP BY gid, sid, pid, oid, tid, lang, data HAVING dups > 1;

#2
--------------------------------
# Time: 090312 20:43:24
# User@Host: mydb @ localhost []
# Query_time: 178 Lock_time: 0 Rows_sent: 9624 Rows_examined: 3073818
SELECT DISTINCT g.uri g FROM rdf_data_calais_node d INNER JOIN rdf_resources g ON d.gid = g.rid;

Comments

Arto’s picture

Title: slow queries... any way to improve? » Optimize slow queries
Assigned: Unassigned » Arto
Category: support » feature
Issue tags: +Performance

Thanks for reporting this.

There are some performance optimizations being done for the next release, and we'll have a look at this as well.

Note that the first query seems to be the duplicate checking functionality that you can configure at admin/settings/rdf. Since it's returning 0 rows, just disable the "Merge duplicate statements on cron runs" setting to eliminate that query.

Do you actually have some 3 million rows in those tables, btw? If so, we'll need to add some indexes.

iandit’s picture

Thanks for looking into this and giving me few tips. I've just disable the "Merge duplicate statements on cron runs".

Yeah it seems I have that amount of data in the rdf tables. I'm using the open calais modules so my guess is that its coming from there. So adding some indexes would certainly help ease my mind :P

Great job on this module.

Arto’s picture

OK, thanks for the confirmation - that amount of data is beyond what I've tested the module with so far, hence why these slow queries haven't been noticed previously. We'll definitely try and fix this for the next release.

febbraro’s picture

I'm curious how many Calais tagged nodes you have. Can you tell me what the following query returns?

select count(distinct(nid)) from calais_term_node;

We'll see what we can do about the performance though. If it gets to be too much of a drain you can always turn off local RDF storing for Calais at admin/settings/calais/calais-node

iandit’s picture

Running the query, I get 9830. Doesn't seem an exorbitant high number to me... but you'll tell me shortly i guess.

smustgrave’s picture

Issue summary: View changes
Status: Active » Closed (outdated)