perf-opt: Leaner json response => more responsive with larger doc-base

danielnolde - November 7, 2009 - 11:06
Project:API
Version:HEAD
Component:Code
Category:task
Priority:normal
Assigned:Unassigned
Status:active
Description

While using the autocomplete search box, the api.module as running on api.drupal.org or drupalcontrib.org loads *all* possible documentation-symbols (i.e. drupal-functions/-objects/.vars etc.) *at once* via a quite large json file, and then searches this json for the term inputted in the autocomplete search box. Like this:

[ "l", "t", "st", "url", "$id", "arg", "$tag", "$nid", "theme", "$item", "Hooks", "$user", "$conf", "get_t", "$user", "locale", "xmlrpc", "$items", "$image", "$theme", "sess_gc", "$db_url", "_xmlrpc", "$timers", "VERSION", "$element", "path.inc", "$queries", "$db_type", "$channel", "mail.inc", "MARK_NEW", "node.php", "core.php", "$profile", "file.inc", "menu.inc", "watchdog", "db_query", "db_error", "form.inc", "DB_ERROR", "node_add", "php_help", "xrds.inc", "book_toc", "cron.php", "index.php", "valid_url", "map_month", "menu_tree", "image.inc", "db_result", "batch_get", "file_move", "_db_query", "file_copy", "SAVED_NEW", "batch_set", "check_url", "conf_path", "base_path", "conf_init", "user_pass", "user_user", "user_mail", "upload_js", "user_perm", "user_init", "user_help", "user_save", "user_load", "user_menu", "color.inc", "color.inc", "user_edit", "user_page", "user_view", "hook_boot", "hook_user", "hook_form", "hook_view", "hook_mail", "theme_box", "pager.inc", "hook_load", "index.php", "$base_url", "$language", "cache_set" ...

On api-sites with a great doc-base/code-base to search, like drupalcontrib.org (i.e. many many contributed modules included in the api doc), this behaviour results in a quite sluggish and unresponsive behaviour, or at least not optimal performance.

Wouldn't it be a great usability-performance enhancement for api.module, to only load those json-encoded on the autocomplete's ajax-request that directly result from the inputted query? Of course, the more targeted json-set would have to be loaded from server first, so that may anihilate any js-performance gain. So perhaps it would make sende to preload a smaller json-set including all possible matches only after inputting at least 3 characters, and preloading all, say, 1-to-5-chars only symbols (like 'l', 't') on page load? So on almost every key stroke, the search data is present, while js combing of the search result stay very fast due to a reduced and semi-targeted json-set, even on large doc-bases.

What do you think?
Does that make sense?

How much work would it be?
Could and want one of the api.module developers do that?
Otherwise, woudl it have a chance to be committed, and at what points in the code would one start to implement that optimization?

cheers,

daniel

#1

drumm - November 7, 2009 - 20:48

AJAX is incredibly slow and doesn't keep up with user typing. For api.drupal.org, the data set is small and the current implementation makes a lot of sense. I want to expand that site to cover contrib, but it doesn't now, and I haven't heard much from the people running sites like drupalcontrib.org.

The requirements are:
- As much as possible, load the database before user typing, not after.
- Keep the database in static files, this avoids Drupal and MySQL latency, and can be heavily optimized on the web server.

The tiered approach you mention above is interesting. The initial database is everything under N chars and there is another database for prefixes of M chars, where M < N. Once M characters are typed, the secondary database is loaded, before the initial database runs out of results.

Another approach is siloing databases, by project and/or popularity. The most likely matches can be loaded first, the project you are looking at or overall popular searches. Less-likely matches are lower priority.

At this point, the biggest help would be putting real data behind the potential approaches. How big are the files? How well do they gzip? How do users use the search?

 
 

Drupal is a registered trademark of Dries Buytaert.