Browse takes too long to load (reading all records?) [#568688]

When a user hits 'Browse' it often takes a very long time to load. We see the pagination at the bottom, but it seems to be running through everything first as mysql is using high resources on the server. For an example: http://citebank.biodiversitylibrary.org/browse - also choose between the 'sort' tabs at the top Author - Title - Type - Year, every hit on one of those takes a long time to come back as well.

Two ideas, can this be reworked to have it only grab one page at a time, or could we make it an option to farm this work out to Solr? We're running Solr for all searches, and we'll need it for the amount of records we're expecting, and know it could handle this task easily.

Thanks

Comments

Comment #1

phil.cryer commented 4 September 2009 at 19:57

Category:

feature

» support

changed to 'support request'

Comment #2

rjerome commented 5 September 2009 at 00:16

Hi Phil,

Yes, I've noticed this on your site before. As it stands, it is only getting one page at a time, however, in order to sort the records correctly, MySQL does read to whole table, and it's even worse if there are sub-sorts because it may have to build an intermediate table (often in a temporary file) and resort on that.

I'm afraid I don't know much about Solr, but I think the issue here is more of a sorting one than a searching one, and in order to sort correctly, I don't know there is any alternative to reading all the records. One possible solution (which I haven't tried yet) might be to create some additional tables which one of which might only contain the year and nodeID, it might be quicker to sort this two column table rather than the full biblio table.

I noticed that it's currently impossible to "export" all 34833 entries from your site, so I reworked the export routines to stream the data to the browser one at a time, rather than building the entire export file and then sending to the browser. It's a bit slower, but for large numbers of entries, it's really the only way to make it happen without using a huge amount of memory on the server.

Also, have you considered pointing your "Browse -> By Author" menu to http://citebank.biodiversitylibrary.org/browse/authors instead of just sorting all the entries by author? It seems a little more intuitive to me since you can then really "Browse" all the Authors contained in your database.

Ron.

Comment #3

rjerome commented 5 September 2009 at 01:01

Hey Phil,

After looking at this a bit, I think I have come up with an incredibly simple solution which doesn't require any database changes...
Could you apply this patch to your biblio.pages.inc file and see how it performs. On my system, it cut page load times (for 6000 entries) from ~20 sec. down to almost instantaneous.

### Eclipse Workspace Patch 1.0
#P biblio
Index: biblio.pages.inc
===================================================================
RCS file: /cvs/drupal-contrib/contributions/modules/biblio/biblio.pages.inc,v
retrieving revision 1.1.2.94
diff -u -r1.1.2.94 biblio.pages.inc
--- biblio.pages.inc	6 Aug 2009 15:42:15 -0000	1.1.2.94
+++ biblio.pages.inc	5 Sep 2009 00:53:53 -0000
@@ -110,12 +110,12 @@
 
   $selects[] = "DISTINCT(n.nid)";
   $count_selects[] = "DISTINCT(n.nid)";
-  $selects[] = "n.*";
-  $selects[] = "b.*";
-  $selects[] = "bt.name as biblio_type_name";
+  //$selects[] = "n.*";
+  //$selects[] = "b.*";
+  //$selects[] = "bt.name as biblio_type_name";
 
   $joins[] = "left join {biblio} b  on n.vid=b.vid ";
-  $joins[] = "inner join {biblio_types} bt  on b.biblio_type=bt.tid ";
+  //$joins[] = "inner join {biblio_types} bt  on b.biblio_type=bt.tid ";
 
   // POSIX regular expression matching, case insensitive
   $match_op = ($db_type == 'pgsql') ? '~*' : 'RLIKE';
@@ -461,7 +461,7 @@
       $content .= drupal_get_form('biblio_search_form');
     }
     if ( biblio_access('export')) {
-      $content .= '<div class="biblio-export">'.t('Export @count results', array('@count' => $pager_total_items[0])).':' . theme('biblio_export_links'). '</div>';
+      $content .= '<div class="biblio-export">'. theme('biblio_export_links'). '</div>';
     }else{
       $content .= '<div class="biblio-export">'.t('Found @count results', array('@count' => $pager_total_items[0])) .'</div><br />';
     }
@@ -489,8 +489,9 @@
 
   while ($node = db_fetch_object($result)) {
     $count++;
-    $node->biblio_contributors = biblio_load_contributors($node->vid);
-    $node->biblio_keywords = biblio_load_keywords($node->vid);
+    $node = node_load($node->nid);
+    //$node->biblio_contributors = biblio_load_contributors($node->vid);
+    //$node->biblio_keywords = biblio_load_keywords($node->vid);
     if (variable_get('biblio_hide_bibtex_braces', 0)) $node->title = biblio_remove_brace($node->title);
     if (module_exists("upload")) $node->files = upload_load($node);
     if (variable_get('biblio_fix_isi_links', 0)) biblio_fix_isi_links($node);

Comment #4

phil.cryer commented 14 September 2009 at 19:37

Status:

Active

» Closed (fixed)

Ron, this has made a huge difference from a UX standpoint, as well as a systems standpoint. While I'm not sure if this will still be acceptable one we really scale up, for now I'm closing this, please merge this code into the trunk, and thanks!

Comment #5

rjerome commented 15 September 2009 at 02:12

Great, I just tried it on your site and it is definitely WAY faster. When you say "really scale up" how many entries are you talking?

That will be in the next release.

Ron.

Comment #6

phil.cryer commented 16 September 2009 at 20:59

Status:

Closed (fixed)

» Needs work

We have an issue, seems you took out a column for 'Type', so from Browse -> Type, it gives the following:

user warning: Unknown column 'bt.name' in 'order clause' query: SELECT DISTINCT(n.nid) FROM node n left join biblio b on n.vid=b.vid WHERE (n.language ='en' OR n.language ='' OR n.language IS NULL) AND ( n.type='biblio' )ORDER BY bt.name desc, CASE WHEN SUBSTR(n.title,1,1)='"' THEN LOWER(SUBSTR(n.title,2)) WHEN SUBSTR(n.title,1,1)='\'' THEN LOWER(SUBSTR(n.title,2)) WHEN SUBSTR(n.title,1,2)='A ' THEN LOWER(SUBSTR(n.title,3)) WHEN SUBSTR(n.title,1,3)='An ' THEN LOWER(SUBSTR(n.title,4)) WHEN SUBSTR(n.title,1,4)='The ' THEN LOWER(SUBSTR(n.title,5)) ELSE LOWER(n.title) END LIMIT 0, 20 in /var/www/devcite.biodiversitylibrary.org/sites/all/modules/biblio/biblio.pages.inc on line 87.

You can view it here: http://devcite.biodiversitylibrary.org/browse&sort=type&order=desc

Comment #7

phil.cryer commented 16 September 2009 at 21:02

Wait, if you uncomment the following, it seems to work again:

$selects[] = "bt.name as biblio_type_name";
$joins[] = "inner join {biblio_types} bt on b.biblio_type=bt.tid ";

and still seems to be fast, can you try that on your end? It's working again if you hit the same link again:

http://devcite.biodiversitylibrary.org/browse&sort=type&order=desc

Needs work

» Fixed

This is now in the 1.7 release.

Comment #11

19 October 2009 at 18:50

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Browse takes too long to load (reading all records?)

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

News items

Our community

Documentation

Drupal code base

Governance of community