problem: drupal_is_denied() run a query that return anonymous result set, which will cause problem in MSSQL db_query_range() implementation.

detail: in case of MSSQL, db_query_range() is implemented as nested query. therefore a anonymous result set will cause an exceptional case and terminated with error. this patch assign a label to the result set, and so solve the problem.

P.S. here is the current implementation of db_query_range(), within MSSQL driver:

<?phpfunction db_query_range($query) {
  $args = func_get_args();
  $count = array_pop($args);
  $from = array_pop($args);
  array_shift($args);

  $query = db_prefix_tables($query);
  $query = _db_query_rewrite_temp_tablename($query);
  $query = preg_replace_callback(ANSI_RESERVED_REGEXP, '_db_query_callback_ansi_reserved', $query);
  $query = preg_replace_callback(MSSQL_RESERVED_REGEXP, '_db_query_callback_mssql_reserved', $query);
  $query = preg_replace_callback(MSSQL_MAX30_REGEXP, '_db_query_callback_mssql_max30', $query);
  if (isset($args[0]) and is_array($args[0])) { // 'All arguments in one array' syntax
    $args = $args[0];
  }
  _db_query_callback($args, TRUE);
  $query = preg_replace_callback(DB_QUERY_REGEXP, '_db_query_callback', $query);
  $query = preg_replace('/SELECT/Dsi', 'SELECT TOP(100) PERCENT', $query);
  $query = 'SELECT * FROM (SELECT sub2.*, ROW_NUMBER() OVER(ORDER BY sub2.line2) AS line3 FROM (SELECT 1 AS line2, sub1.* FROM (' . $query . ') AS sub1) as sub2) AS sub3 WHERE line3 BETWEEN ' . ($from + 1) . ' AND ' . ($from + $count);
  return _db_query($query);
}
?>

Comments

hswong3i’s picture

Status: Needs review » Needs work

thanks for Zlender report (http://drupal.org/node/165160#comment-289474), we may also need to update this query as:

SELECT CASE WHEN status=1 THEN 0 ELSE 1 END FROM {access} WHERE type = '%s' AND (LOWER(mask) LIKE LOWER('%s') OR LOWER('%s') LIKE LOWER(mask)) ORDER BY status DESC

BTW, this may be not function within DB2... i will spend some time for cross database compatibility testing later :)

mooffie’s picture

>
> [...] which will cause problem in MSSQL db_query_range() implementation [...]

I don't think you should bother yourself with db_query_range() right now. Pretend for the meantime that this query is a simple db_query(). The problem is that the query doesn't do what it's supposed to do.

> SELECT [...] FROM {access} WHERE type = '%s' AND
> (LOWER(mask) LIKE LOWER('%s')
> OR LOWER('%s') LIKE LOWER(mask)) [...]

I think the original version, before the patch, was the correct one. It sould only be LOWER('%s') LIKE LOWER(mask).

An example:

The {access} table contains rules for, e.g., IP access. These rules are stored as SQL wildcards in the 'mask' column of the table. When Drupal wants to check if IP '194.14.56.98' has access to our website it does:

[...] WHERE LOWER('194.14.56.98') LIKE LOWER(mask)

It's logical. You say that DB2 dosn't support "text LIKE column", but switching the two won't solve the problem: this changes the logic to one which isn't the right one.

Maybe there's no options but to put a conditional that executes a different, suitable query when DB2 is used.

(BTW, I think there's an error in the documentation for drupal_is_denied(): I don't think the $mask parameter ever contains SQL wildcards.)

hswong3i’s picture

after a indeed studying about limitation of DB2, i found that is almost impossible to implement a similar feature without changing program logic (http://www.webservertalk.com/archive220-2005-11-1272872.html):

The problem is that you can't use columns on the right side of the LIKE predicate. Everything on the right must be pre-determined, i.e. constant, during the query execution and columns do not contain the same value for each row. So you're out of luck.

so here are some proposes

  1. don't change program logic. since DB2 driver will use @db2_execute(), therefore error message will not show out, and simply return a NULL result set. this means the allow/deny checking will be by-passed within DB2 implementation. if this is our choice, all we need is document this risky within DB2's README, and roll back the patch
  2. re-factor programming logic, by query level regex. some database (e.g. Oracle) support query level regex, which is similar as our current implementation. BTW, it is also a database dependent feature, so this shouldn't be our choice
  3. re-factor programming logic, by PHP level regex, which similar as mailman allow/deny handling. benefits includes: 1. we will able to implement the logical checking in a much flexible way, e.g. we can first allow all case, and then deny some specific case (similar as apache allow/deny handling); 2. we don't need to save the compare rules in SQL format (directly) in database, which may cause a security back door (SQL injection). BTW, the trade-off is very simple: the performance, as we need to implement the logical checking in a muh complicated way

from my point of view, we shouldn't save compare rules in SQL format, which may cause SQL injection. on the other hand, if this is a database dependent implementation, we should consider about re-implement it. so i will suggest re-factor it with case 3 (which shouldn't be too difficult). BTW, if performance is our main concern, just feel free to disable such feature within DB2 :)

hswong3i’s picture

just brain storming: is it possible to use preg_match + cache system for allow/deny filtering?

e.g. rules are written in perl regex format if starting with "^", or else written in plan text for complete word mapping. this is the similar case as mailman and some other system's allow/deny filtering method.

on the other hand, the regex patten will save within cache system, as like as variable. before calling drupal_is_deny(), we will initialize those regex for compare (on the other hand, build the regex and save into cache if not yet exists).

so the overall procedures will be limited into: (1 query: access cache init) + (2 preg_match: for allow and deny). i guess it shouldn't trade off too much performance, but this should be proved by benchmarking :)

mooffie’s picture

>
> so here are some proposes
> [...]
> re-factor programming logic, by query level regex.

BTW, how hard it would be to put code in DB2's db_query() that replaces any /LIKE (\w+)/ with the correct syntax? I mean, something similar to this imaginary line:

$query = preg_replace('/([\w.]+) LIKE (\w+)/', "regexp(\\1, \\2, replace(\\2, '%', '.*'))", $query);

(Since \w+doens't match quotes, in would only match LIKE'ing agains columns.)

Could you please submit a patch that restores the original line in drupal_is_denied() ?

hswong3i’s picture

StatusFileSize
new4.8 KB

@mooffie: i think the patch for rollback won't be too difficult. we just need to shift the position of 'mask' and '%s', and it will function as before.

here i try to introduce a patch that change matching rule into regex format, as like as mailman's sender/recipient filters. so rules will written as:

  1. complete phase: the phase will use directly for regex mapping. all regex elements (e.g. . \ + * ? [ ^ ] $ ( ) { } = ! < > | :) will be escaped by preg_quote. e.g. user rule with "dummy" will only map with user name "dummy"
  2. phase starting with '^': the phase will act as regex expression for mapping. e.g. host rule with "^.*drupal.org" with map for both "www.drupal.org", "ftp.drupal.org", etc

on the other hand, it clone the variables caching handling: the processed regex expression will saved into cache table, and initialize once before calling drupal_is_denied() within drupal_bootstrap(), if it is not existing.

total processing will limited into: 1 simple select query for cache_get() + [numbers of query and procedure for cache rebuild (optional)] + [1 preg_match() for deny rules (optional)] + [1 preg_match() for allow rules (optional)]. it shouldn't be much performance different with existing implementation.

for backward compatibility, we can use a script to translate existing rules into new format: "%" can replace as ".*", where "_" can replace as ".". i will work out that translate script within *.install, after this patch's review :)

hswong3i’s picture

Title: update drupal_is_denied as SQL friendly » use regex for drupal_is_denied() checking
Category: bug » feature
Status: Needs work » Needs review
hswong3i’s picture

StatusFileSize
new6.05 KB

@killes: thanks for your comment. here is an updated version with more document and comment.

hswong3i’s picture

StatusFileSize
new6.05 KB

minor update, due to document miss typing :(

moshe weitzman’s picture

this does not look particularly simple or elegant to me.

hswong3i’s picture

@moshe: i hope this presentation shouldn't be too difficult for system administrators and end users, if they have expreian

more over, this patch comes with numbers of benefit:

  1. it is very closed with Apache mod_setenvif (http://httpd.apache.org/docs/2.0/en/mod/mod_setenvif.html) and mailman sender filter (http://www.washington.edu/computing/mailman/owners/regexp.html#use) handling
  2. this presentation don't change any backend handling (admin/user/rules) and database structure. by using some simple script, we will also able to translate the our old access rules into new format
  3. using regex is much powerful and general than using SQL matching, in case of allow/deny rules definition. administrators will able to define rules in a better and detail format
  4. by using regex, we can also escape from the risk of SQL injection, when compare with using SQL matching directly
  5. with the help of combine handling with cache system, query is now much simple when compare with original implementation (a complicated LIKE + CASE + ORDER BY + db_range_query())
  6. as regex pattern is cached, it will be much scalable when compare with original implementation: original query will need to look around all rules within {access} table; in case of this patch, we ALWAYS fetch data from ONE row ONLY (from {cache} table)
  7. the most important benefit: it is a database independent implementation :)

i will try to clone mailman's "non-regexp matches are always done first" handling in coming on version (quote from mailman sender filter setting page):

In the text boxes below, add one address per line; start the line with a ^ character to designate a Python regular expression. When entering backslashes, do so as if you were using Python raw strings (i.e. you generally just use a single backslash).

Note that non-regexp matches are always done first.

hswong3i’s picture

StatusFileSize
new6.63 KB

version 0.3 includes:

  1. feature add: update _drupal_access_init(), so non-regexp matches are always done first. (clone from mailman)
  2. bug fix: update position of calling cache_clear_all(). they will now called just after {access} table is altered by INSERT/UPDATE/DELETE

here is a simple example. for database with following rules defined:

mysql> SELECT * FROM access;
+-----+----------------+------+--------+
| aid | mask           | type | status |
+-----+----------------+------+--------+
|   1 | google.com     | host |      0 |
|   2 | ^.*drupal\.org | host |      0 |
|   3 | download.com   | host |      0 |
|   4 | drupal.org     | host |      0 |
|   5 | demo1          | user |      1 |
|   6 | demo2          | user |      1 |
|   7 | ^demo.*        | user |      1 |
+-----+----------------+------+--------+
7 rows in set (0.00 sec)

access rule will preprocessed and cached in $access as:

<?php
Array
(
    [host] => Array
        (
            [deny] => /^google\.com$|^download\.com$|^drupal\.org$|^.*drupal\.org/Ds
        )

    [user] => Array
        (
            [allow] => /^demo1$|^demo2$|^demo.*/Ds
        )

)
?>

and so rules will used for preg_match() directly without any changes :)

dries’s picture

Because this query is used on _every_ page view, we spent a lot of time optimizing it. In Drupal 5, we managed to improve performance of cached pages by more than 5% by making a tiny improvement to this query. In other words, if the proposed changes make this query slower, it can result in a significant performance loss. We'll want to benchmark this change.

I agree with Moshe that this patch isn't particularly elegant, but let's first see what benchmarks tell us.

hswong3i’s picture

StatusFileSize
new3.64 KB

i just complete a simple benchmarking testing with CVS HEAD, regex patch (no cache), and regex with cache, with the following drupal setup:

  1. user: 2000
  2. node: 5000
  3. comment: 10000
  4. vocabulary: 15
  5. terms: 250
  6. rules: random generate with 8 digits (50% normal format, 50% starting with '^')
  7. others: turn on all module + devel

all other details are following with "HowTo: Benchmark Drupal code": http://drupal.org/node/79237. here is the benchmarking result with 0/50/100/150/200/250/1000 rules. each cycle test for 500 page hits:

                +-----+-----+-----+-----+-----+-----+------+
                | 0   | 50  | 100 | 150 | 200 | 250 | 1000 |
+---------------+-----+-----+-----+-----+-----+-----+------+
| CVS           | 307 | 303 | 309 | 312 | 311 | 309 | 306  |
+---------------+-----+-----+-----+-----+-----+-----+------+
| regex         | 308 | 302 | 308 | 307 | 305 | 306 | 318  |
+---------------+-----+-----+-----+-----+-----+-----+------+
| regex + cache | 306 | 306 | 301 | 308 | 306 | 309 | 310  |
+---------------+-----+-----+-----+-----+-----+-----+------+

according to above result, we can found that regex patch for drupal_is_denied() (with cache handling) won't drop a lot of performance: at least it preform as fast as existing implementation. on the other hand, as discussed above, the regex patch comes with numbers of benefits: a more powerful matching rules, and it is database independent. hope this benchmarking result can helps about our decision :)

hswong3i’s picture

StatusFileSize
new6.82 KB

minor update: code style clean up and document update. it should now seems better :)

moshe weitzman’s picture

does i makes sense to store the access rules with variable_set() and get rid of the access table? that way you get the cache set/get for free, and simplify some code.

hswong3i’s picture

@moshe: i think variable_get/set may be not suitable in this cases:

  1. variables will first initialize by variable_init() in DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE, which is after DRUPAL_BOOTSTRAP_ACCESS. as variables are not yet initialized, nothing we can fetch from variable_get()
  2. it is not suitable to initialize variables on or before DRUPAL_BOOTSTRAP_ACCESS. since this stage will run for both cached or non-cached version, we shouldn't increase workload within this stage. on the other hand, we don't need those variables within this stage, so that is no point to initialize them
  3. queries for _drupal_access_init() are lightweight enough and suitable for DRUPAL_BOOTSTRAP_ACCESS. unless rules are changed, we only required for a simple SELECT query called by cache_get(), for access rules initialization. it is much better when compare with existing implementation, and able to decrease database workload
hswong3i’s picture

StatusFileSize
new13.78 KB

minor update: also fix bug with INSERT statement, which manually insert aid into "serial" field (cause error), on the other hand uselessly fetch last insert id after INSERT.

hswong3i’s picture

StatusFileSize
new7.03 KB

correct version

hswong3i’s picture

Status: Needs review » Needs work

wait! i will try to standardize access rules handling, as like as variables. therefore we may also implement Apache-like access/deny rules handling for block display, or some other cases :)

hswong3i’s picture

Status: Needs work » Needs review
StatusFileSize
new13.02 KB

feature add version, which standardize access checking into Apache mod_authz_host style (http://httpd.apache.org/docs/2.2/mod/mod_authz_host.html):

  1. add access_get() (replace drupal_is_denied()). ordering can optionally defined. types are not restricted in 'host', 'mail' and 'user' only, so we may able to apply this access checking into some other location (e.g. block visibility setting)
  2. add access_set() and access_del(). isolated handling as like as variable_*()
  3. update user.module into new style

TODO: now we can filter IP address by: 1. full IP address format, or 2. standard regex format. i would like to implement the following IP filtering formats within coming on version (implement translation within access_init()):

  1. A partial IP address (e.g. 10.1)
  2. A network/netmask pair (e.g. 10.1.0.0/255.255.0.0)
  3. A network/nnn CIDR specification (e.g. 10.1.0.0/16)
hswong3i’s picture

Title: use regex for drupal_is_denied() checking » enhance drupal_is_denied() checking
StatusFileSize
new16.53 KB

i clone most handling from Apache mod_authz_host, unless partial domain-name notation (as it will conflict with plain text checking). now the access checking support the following formats:

  1. regex: mask starting with '^' (partial domain-name notation should also written in regex, e.g. ^.*drupal\.org)
  2. ip address: full, partial, network/netmask pair, and network/nnn CIDR specification
  3. plain text (complete phase match)

NOTE: i keep the rules caching system and simplify the access rule checking, so it should keep the same performance as previous benchmarking result.

hswong3i’s picture

StatusFileSize
new16.37 KB

simplify full IP address handling.

hswong3i’s picture

StatusFileSize
new17.33 KB

minor update:

  1. enhance access_del(), so we will able to delete a type of rules (it will be useful if we apply this access checking system to block visibility check)
  2. bug fix user_admin_access_delete_confirm()
hswong3i’s picture

StatusFileSize
new16.77 KB

minor update: bulky code cleanup, minor bug fix, document update.

hswong3i’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new15.45 KB

i try to minimize the change to existing API in this version, remove access_set() and access_del(). it is now database independent, with following additional features:

  1. plain text filter
  2. regexp filter (starting with "^")
  3. 4 different type of IP address representation: full, partial, network/netmask pair, and network/nnn CIDR specification
  4. configurable checking behavior: "allow,deny" or "deny,allow" (default)

BTW, as the function is now trying to check for access allow, it is better to change its name from drupal_is_denied() into drupal_is_allowed(). anyway, if we are very serious about "not changing API after code freeze", i can create a newer version that keep it as drupal_is_denied() (just some simple "NOT" handling).

function is not complicated but powerful, and fully tested. it should able to be RTBC.

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Needs review

Hum, please do not RTBC your own patches. Get others test it and mark RTBC if they think so. Peer review works this way. Also, this patch seems to be escalated from the original goals to lots of new features and changes, for which it seems to be quite late in the cycle to me. Not only the function name change is an API break, but the matching format change is such too.

hswong3i’s picture

Version: 6.x-dev » 7.x-dev
Priority: Critical » Normal
Status: Needs review » Active

@Gabor: sorry about that, as i just hope to catch more focus, and explore its possibility for D6 as i can. according to the counter patch for D6 beat in http://drupal.org/node/165160, i will keep this issue for D7, and so we can review this once again in the near feature :)

chx’s picture

Status: Active » Needs review
birdmanx35’s picture

Status: Needs review » Needs work

No surprise, but this no longer works against HEAD.

catch’s picture

Status: Needs work » Closed (duplicate)