Optimize the performance of condition evaluation part of rule sets

By Saubhagya on 4 Apr 2010 at 08:59 UTC

Overview:

In the rule module, rule sets contain several conditions and corresponding actions. Condition checking algorithm of the rule module is very basic. The performance for large rule set of this algorithm is not satisfactory. I want to change the condition evaluation part in a decision tree way to optimize the performance.

Description:

The strategy of rule evaluation in rule module is ECA (event-condition-action). In this method, on the occurrence of the event the condition parts of all the rules defined by that event are evaluated sequentially, based upon the priority (weight) given to the rules and the outcome of this (TRUE or FALSE) decides the actions to be taken.

This approach seems correct if the number of rules in rule set is less, but as the number increases the performance of this scheme declines. Troubleshooting part is condition evaluation part. Condition part of any rule is generally made up of several basic conditions (like content has type, compare two users) and their negations joined up through logical operators (AND, OR). These basic conditions repeat several time in a given rule set dictated by an event. In ECA, we compute the same result of basic conditions again and again.

To optimize the performance of this condition evaluation part I want to implement a decision tree like structure. The basic conditions will be given by the nodes of the tree, which on evaluation will direct the result to one of the two branches (TRUE or FALSE). From the leaves of the tree there will be lists attached containing the actions to be taken.

The conditions will be checked at every level of the tree. The output of one condition will direct the sequence to one of the sub-tree through its branch upon evaluating to be TRUE or FALSE. Sequence will end at one of the leaf node containing the list of the actions to be performed in-order of the priority. So the highest priority event will occur first then the next one in this sequence. User will define the rule sets containing conditions and corresponding actions and the tree will be generated. This pre-processing step might take some time but the speedup in the overall functioning will much higher. This generated tree can easily be reordered with the addition of rules.

The scheme I introduced at its worst case evaluates all exclusive (not dependent) conditions of a rule set on occurrence of an event. This is optimal performance as expected. And this makes a huge difference to the scheme rule module currently follow. In this approach the weights of different rules are considered. Here I showed only the conditions made up by basic conditions joined up by AND operations. Conditions containing OR operations can also be converted in this format. As:

(A ˅ B) ˄ C → X can be stated as
A ˄ B → X, B ˄ C → X that means if any one of these conditions is satisfied, result will be X

Here, ˄ = AND
˅ = OR
! = NOT

Left part of arrow (→) denotes the condition part and right part is corresponding action to be taken. Accordingly A, B and C all are basic conditions and X is the action.

Moving upon the same terminology, consider these rules are associated with an event.

Condition → Action...Priority

(! A) ˄ B ˄ C → V.......0
A ˄ C → W................2
C → X......................0
A ˄ (! C) → Y............1
(! A) ˄ B → Z............3

Here, all the conditions have same template (A B C) this step saves the part of finding inter-dependencies within the different conditions.

To make a single decision tree, we need to get the set of maximum exclusive conditions. This is {A, B, C} so the tree generated will have 3 levels of conditions (at max.). Generated tree will look like this,

------------------------------------A

-------------------------C………………………………….B

----------------W, X………………… Y-------C……………………….C

-------------------------------Z, X, V………….. Z-----X

Siblings are connected through dots. Dashes are for indentation.
Node A has two children C and B as left and right. Similarly left child of A (C) has two children. Left contains to element (W, X) and right has one element Y. Here, nodes A, B, C show the basic conditions and other nodes are the actions. If the condition is evaluated to be TRUE, the left branch is taken. Otherwise right branch is taken. At the action node the actions are taken in sequence. Rules having different weights can be handled easily through this method.

Extended rule of C → X can be represented as

A ˄ C → X
(! A) ˄ B ˄ C → X
(! A) ˄ (! B) ˄ C → X

This also gives the same result. This is why X is inserted at three places. Similarly we can look at other rules.

The actions are inserted with their priority in action node. So if A comes out to be FALSE and B, C are TRUE then Z, X and V are taken in the sequence. With the insertion of any new rule tree can be reordered.

My work will be in three phases. In the first phase I’ve to find all the basic conditions. Some conditions can be derived from the existing conditions. As, if a node is of one type, it can’t be of any other type. Conditions having OR in their condition part will be converted in the form of AND as illustrated above. This is done, because only this form can be implemented in decision tree.

In the second phase the result obtained from the first phase will be extended. The reordering of the rules is done. So their condition parts will in same sequence (like A B C in the example). This step will remove the necessity of finding the dependencies in between different conditions. In the same time the basic rules will be extended, as I stated. In the extended rules the template of condition part will be in form of that of the decision tree, where these will be inserted. So the insertion will be quite easy.

In the final phase, the real power of the algorithm decision tree will be implemented. Actions of different weights can be ordered according to their weight in the action nodes. This solves the problem of rules having different weights.

Timeline:

* Till May 24th - I’ll be familiarize with the basics of existing work and will try to get reviews from the Drupal community.
* May 24th to 31st - I’ll do some CVS practice and make a general framework about the detailed algorithm to be used.
* June 1st – 21st – Code the first phase, as stated above.
* June 21st - July 12th – Code the second phase of the project and in the same time test the work of the first phase.
* July 12th – Midterm submission and I’ll make a short term strategy for the future work.
* July 16th - August 2nd – Code the third phase of the project.
* August 2nd - August 20th – I’ll brush-up the final code and try to cover documentation part.

Mentor:

Wolfgang Ziegler (fago).

Contact Details:

IRC nick-saubhagya

Difficulty:

Medium

Comments

=

bwv commented 4 April 2010 at 10:01

May I make a suggestion? You might want to make this a book page so it does not get lost amid all the regular forum nodes.

Optimize the performance of condition evaluation part of rule sets

Comments

=

New forum topics

News items

Our community

Documentation

Drupal code base

Governance of community