Posted by phil.cryer on September 4, 2009 at 7:57pm
Jump to:
| Project: | Bibliography Module |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
When a user hits 'Browse' it often takes a very long time to load. We see the pagination at the bottom, but it seems to be running through everything first as mysql is using high resources on the server. For an example: http://citebank.biodiversitylibrary.org/browse - also choose between the 'sort' tabs at the top Author - Title - Type - Year, every hit on one of those takes a long time to come back as well.
Two ideas, can this be reworked to have it only grab one page at a time, or could we make it an option to farm this work out to Solr? We're running Solr for all searches, and we'll need it for the amount of records we're expecting, and know it could handle this task easily.
Thanks
Comments
#1
changed to 'support request'
#2
Hi Phil,
Yes, I've noticed this on your site before. As it stands, it is only getting one page at a time, however, in order to sort the records correctly, MySQL does read to whole table, and it's even worse if there are sub-sorts because it may have to build an intermediate table (often in a temporary file) and resort on that.
I'm afraid I don't know much about Solr, but I think the issue here is more of a sorting one than a searching one, and in order to sort correctly, I don't know there is any alternative to reading all the records. One possible solution (which I haven't tried yet) might be to create some additional tables which one of which might only contain the year and nodeID, it might be quicker to sort this two column table rather than the full biblio table.
I noticed that it's currently impossible to "export" all 34833 entries from your site, so I reworked the export routines to stream the data to the browser one at a time, rather than building the entire export file and then sending to the browser. It's a bit slower, but for large numbers of entries, it's really the only way to make it happen without using a huge amount of memory on the server.
Also, have you considered pointing your "Browse -> By Author" menu to http://citebank.biodiversitylibrary.org/browse/authors instead of just sorting all the entries by author? It seems a little more intuitive to me since you can then really "Browse" all the Authors contained in your database.
Ron.
#3
Hey Phil,
After looking at this a bit, I think I have come up with an incredibly simple solution which doesn't require any database changes...
Could you apply this patch to your biblio.pages.inc file and see how it performs. On my system, it cut page load times (for 6000 entries) from ~20 sec. down to almost instantaneous.
### Eclipse Workspace Patch 1.0
#P biblio
Index: biblio.pages.inc
===================================================================
RCS file: /cvs/drupal-contrib/contributions/modules/biblio/biblio.pages.inc,v
retrieving revision 1.1.2.94
diff -u -r1.1.2.94 biblio.pages.inc
--- biblio.pages.inc 6 Aug 2009 15:42:15 -0000 1.1.2.94
+++ biblio.pages.inc 5 Sep 2009 00:53:53 -0000
@@ -110,12 +110,12 @@
$selects[] = "DISTINCT(n.nid)";
$count_selects[] = "DISTINCT(n.nid)";
- $selects[] = "n.*";
- $selects[] = "b.*";
- $selects[] = "bt.name as biblio_type_name";
+ //$selects[] = "n.*";
+ //$selects[] = "b.*";
+ //$selects[] = "bt.name as biblio_type_name";
$joins[] = "left join {biblio} b on n.vid=b.vid ";
- $joins[] = "inner join {biblio_types} bt on b.biblio_type=bt.tid ";
+ //$joins[] = "inner join {biblio_types} bt on b.biblio_type=bt.tid ";
// POSIX regular expression matching, case insensitive
$match_op = ($db_type == 'pgsql') ? '~*' : 'RLIKE';
@@ -461,7 +461,7 @@
$content .= drupal_get_form('biblio_search_form');
}
if ( biblio_access('export')) {
- $content .= '<div class="biblio-export">'.t('Export @count results', array('@count' => $pager_total_items[0])).':' . theme('biblio_export_links'). '</div>';
+ $content .= '<div class="biblio-export">'. theme('biblio_export_links'). '</div>';
}else{
$content .= '<div class="biblio-export">'.t('Found @count results', array('@count' => $pager_total_items[0])) .'</div><br />';
}
@@ -489,8 +489,9 @@
while ($node = db_fetch_object($result)) {
$count++;
- $node->biblio_contributors = biblio_load_contributors($node->vid);
- $node->biblio_keywords = biblio_load_keywords($node->vid);
+ $node = node_load($node->nid);
+ //$node->biblio_contributors = biblio_load_contributors($node->vid);
+ //$node->biblio_keywords = biblio_load_keywords($node->vid);
if (variable_get('biblio_hide_bibtex_braces', 0)) $node->title = biblio_remove_brace($node->title);
if (module_exists("upload")) $node->files = upload_load($node);
if (variable_get('biblio_fix_isi_links', 0)) biblio_fix_isi_links($node);
#4
Ron, this has made a huge difference from a UX standpoint, as well as a systems standpoint. While I'm not sure if this will still be acceptable one we really scale up, for now I'm closing this, please merge this code into the trunk, and thanks!
#5
Great, I just tried it on your site and it is definitely WAY faster. When you say "really scale up" how many entries are you talking?
That will be in the next release.
Ron.
#6
We have an issue, seems you took out a column for 'Type', so from Browse -> Type, it gives the following:
user warning: Unknown column 'bt.name' in 'order clause' query: SELECT DISTINCT(n.nid) FROM node n left join biblio b on n.vid=b.vid WHERE (n.language ='en' OR n.language ='' OR n.language IS NULL) AND ( n.type='biblio' )ORDER BY bt.name desc, CASE WHEN SUBSTR(n.title,1,1)='"' THEN LOWER(SUBSTR(n.title,2)) WHEN SUBSTR(n.title,1,1)='\'' THEN LOWER(SUBSTR(n.title,2)) WHEN SUBSTR(n.title,1,2)='A ' THEN LOWER(SUBSTR(n.title,3)) WHEN SUBSTR(n.title,1,3)='An ' THEN LOWER(SUBSTR(n.title,4)) WHEN SUBSTR(n.title,1,4)='The ' THEN LOWER(SUBSTR(n.title,5)) ELSE LOWER(n.title) END LIMIT 0, 20 in /var/www/devcite.biodiversitylibrary.org/sites/all/modules/biblio/biblio.pages.inc on line 87.
You can view it here: http://devcite.biodiversitylibrary.org/browse&sort=type&order=desc
#7
Wait, if you uncomment the following, it seems to work again:
$selects[] = "bt.name as biblio_type_name";
$joins[] = "inner join {biblio_types} bt on b.biblio_type=bt.tid ";
and still seems to be fast, can you try that on your end? It's working again if you hit the same link again:
http://devcite.biodiversitylibrary.org/browse&sort=type&order=desc
#8
Yes, that is the correct fix. I noticed that a while back but forgot to mention it to you.
#9
Thanks Ron, do you want me to close this then, will that code be in the next release?
#10
This is now in the 1.7 release.
#11
Automatically closed -- issue fixed for 2 weeks with no activity.