Last updated July 19, 2011. Created by iimitk on October 30, 2008.
Edited by stfwi, samphors_suy70, aj045. Log in to edit this page.

In the programming world, it is a race. It's a race to see who can write code that runs the fastest. Not just the quickest code, but the most useful code. And if you're a programmer, who then is your opponent? Yourself, of course! As programmers, we seek to write code that not only does something, it does something well. We refer to this "wellness" as efficiency. Efficiency is often derived from both speed and purpose. Together, the two must thrive harmoniously. As in the case of Drupal, speed is of great importance because we do not want the user to wait or have the server timeout. Purpose is equally important because Drupal is really nothing if there is nothing useful it can provide to it's audience.

Here is one way to think about: a tangible product. Say you own a company that produces product B. Your company, like any other company that produces a product, is being challenged: do you create a lot of product B really fast, or do you make product B extremely useful. In a world with no boundaries we'd do both to maximum extent possible. Unfortunately, manufacturing processes for product B do not support both ideas simultaneously, hence a boundary. The company will then spend time and effort to figure out where the "perfect" balance is. Balance is struck by realizing certain conditions: price of components, labor, value, and many other factors. Thus, once enough research has been done, the company adjusts it's manufacturing processes to produce the most of product B while at the same time giving purpose and quality to product B.

The parallels in the programming world are strikingly similar. You want your program to serve a purpose (the purpose is your "product B"). You also want the program to provide incredible speed and equally incredible purpose (and we'll assume quality is part of purpose). However, due to limitations of modern technology, you can not have both to their fullest extent simultaneously. Thus, you must find a balance between the two because you cannot have one or the other: speed alone is nothing without purpose, and purpose is nothing if it is not delivered speedily. That-- the finding of that balance-- is what the pages under this chapter are concerned about.

Topics introduced here will relate to easily understood problems, which will be described as well. Topics and problems include: efficiency of recursion, working with a set of unrelated data, and working with large sets of data, among other things. Problems in programming can often very easily be boiled down to a simple question, yet the simplest question can some times require a lot of thinking to solve.

If you have a topic relating to efficiency that you'd like to see here, feel free to post a comment under this page, jump on IRC and mention it, or even the add the page yourself, if you feel knowledgeable on the topic.

Examples

Try to combine database queries instead of requesting in a loop

This example shows a comparison of two codes with database queries, which both return information about users as array of standard class objects. Each element looks like e.g.:

[...]
[5] => stdClass Object
        (
            [uid] => 5
            [name] => Rincewind
            [roles] => Array
                (
                    [0] => administrator
                    [1] => editor
                    [2] => member
                )
        )
[...]

As every Drupal installation has users, this should be a testable example for all setups. There are of cause more ways to optimize database queries, this is not the main point here, we focus here on multiple queries.

In the first implementation, this is realized by fetching names and uids from all users, iterating through the result array and asking the database for the role names that match the uid.

<?php
for ($i=0; $i<10000; $i++) {
   
// Get the users with name and uid
   
$users = db_query("SELECT uid, name FROM {users}")->fetchAll();
    foreach (
$users as $key => $user) {
       
// Get the names from the role table
       
$user_roles = db_query(
           
"SELECT name FROM {role} WHERE rid IN
                      (SELECT rid FROM {users_roles} WHERE uid=:uid)"
,
            array(
":uid" => $user->uid)
        );
       
// Transfer the roles into the user representation object:
       
$user->roles = array();
        foreach (
$user_roles as $role) {
           
$user->roles[] = $role->name;
        }
    }
}
// Doing all this 1for 20 users took a laptop 2376.4ms.
?>

Here an improved code that implements two single database queries and then combines the data. It firstly retrieves all names and uids. Secondly the role names and uids are retrieved (note that the uids are not unique in the result, ther is an {uid, role_name} pair for every role that a user has.

<?php
for ($i=0; $i<1000; $i++) {
   
// Get the users with name and uid
   
$users = db_query("SELECT uid, name FROM {users}")->fetchAllAssoc("uid");
   
// Get uids and role names as one result containing the columns "uid" and "name",
   
$roles = db_query(
       
"SELECT u.uid, r.name FROM {users_roles} AS ur
            LEFT JOIN {users} AS u ON u.uid=ur.uid
            LEFT JOIN {role} AS r ON r.rid=ur.rid;"
   
);
   
// Initialize the user object, so that we don't need to generate the
    // role array dynamically in the next loop ...
   
foreach ($users as $key => $user) {
       
$users[$key]->roles = array();
    }
   
// Now iterate through all role names and add the role names to the
    // $roles array of the user with the corresponding uid ...
   
foreach ($roles as $uid => $role) {
       
$users[$role->uid]->roles[] = $role->name;
    }
}
// Doing all this for 20 users took a laptop 539ms.
?>

Also possible would be three queries: one for the users (associative with key uid), one for only the role names (assiciatice {key => value} pairs: {rid => rolea name}), and one for the role ids of all users.

<?php
for ($i=0; $i<1000; $i++) {
   
// Get the users with name and uid
   
$users = db_query("SELECT uid, name FROM {users}")->fetchAllAssoc("uid");
   
// Get the roles with name and rid
   
$roles = db_query("SELECT rid, name FROM {role}")->fetchAllKeyed();
   
// Get uids and rids
   
$users_roles = db_query("SELECT uid, rid FROM {users_roles}");
   
// Initialize the user object, so that we don't need to generate the
    // role array dynamically in the next loop ...
   
foreach ($users as $key => $user) {
       
$users[$key]->roles = array();
    }
   
// Now iterate through the n-to-n correspondency table and assemble
    // the roles
   
foreach ($users_roles as $user_role) {
       
$users[$user_role->uid]->roles[] = $roles[$user_role->rid];
    }
}
// Doing all this for 20 users took a laptop 538ms.
?>

Summary: The first code take 5 times longer than the other two, means that you should try to avoid queries in loops. Instead, build bigger queries and send them at combined. This applies as well for INSERT INTO or DELETE FROM queries with multiple rows.

Take care with getters and setters

Coming from object oriented programming languages like Java, C++, C# etc people tend to proper class implementations with instance variable ("property") encapsulation. The properties can only be accessed using getters and setters, which allows elaborated exception handling and conditioning of the input or output. So far, so good. Now the bad thing: Calling methods, such as getters and setters costs a lot of performance in PHP. Let's make the test:

Here a class with encapsulated private properties. We create an instance and perform some read/write operations:

<?php
// A class with private instance variables, getters and setters
class PT_Contact1 {
    private
$name = "";
    public function
getName()
    { return
$this->name; }
    public function
setName($name)
    {
$this->name = $name; }
}
// Lot of Set/Get operations
$person = new PT_Contact1();
for (
$i=0; $i<1000000; $i++) {
   
$person->setName("Rincewind");
   
$name = $person->getName();
}
// Takes 1617.6ms
?>

In contrast, here a code which uses direct public property access:

<?php
// A class with public instance variables, no getters and setters
class PT_Contact2 {
    public
$name = "";
}
// Lot of Set/Get operations
$person = new PT_Contact2();
for (
$i=0; $i<1000000; $i++) {
   
$person->name = "Rincewind";
   
$name = $person->name;
}
// Takes 573.3ms
?>

Summary: The direct access is more then twice performant as the encapsulation. It's a shame, but it means you have to check where you need proper encalsulation and where not, or better said where you have a lot of accesses. PHP does not optimize it for you (tested with version PHP5.3).
One (half) way out of the encapsulation misery is to make properties private when you realize you need it and then implementing the magic methods __get(), __set(), etc.

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.

Comments

[Criticism withdrawn.]

Nice direction, but actual code examples might be needed, maybe add a basic example of an efficient vs an inefficient database query?

- - turpana.com - -

I will try to collect some other examples, would be nice if an english native could review the examples text.