Code review findings...

hass - August 3, 2009 - 18:35
Project:Web Links
Version:6.x-2.x-dev
Component:Contrib: Checker
Category:bug report
Priority:critical
Assigned:Unassigned
Status:closed
Description

Here comes a list issues I've found while reviewing the code to find out the differences between "linkchecker" and "weblinks_checker" and to get some understanding of what this module does and what not. Here are my findings and I'd only like to note that I do not like to open N-cases as I'm not using the module myself and only taken a look to support more linkchecker users... some of the findings might look like nit picking, but are nevertheless not core worthy. Take it as is - do not expect that I'm going to work on one of this issues, please.

1. $response->error may not be saved in database for the reason of encoding issues. See linkchecker code.
2. Many translatable string bugs. (double blanks, etc)
3. Code style bugs over code style bugs - I do not like to comment on all of them... uncountable - sorry.
4. A max link check count per cron is design wise not working well. 50 is not very much, the server will be able to get more checks done in normal cases, but not always... in rare cases 50 may otherwise be to much. Linkchecker have a solution and have this user confusing setting removed from UI...
5. Never surround %d with single quotes!
6. We have a database abstraction layer, please use it. Don't do this in D6: ... WHERE type='weblinks' ... or ... n.status=0....
6. Never make line breaks in translatable strings! This is not supported by core and POTX.
7. Use double quotes only if required.
8. Many context sensitive issues in translatable strings like: 'Weblinks moved !links from user !from to user !to.'
9. I would like to understand what "// Note that we have to try to bypass a core bug in valid_url()." means... What bug is this comment talking about? There is no link to an issue. It may be something that has been fixed in core some time ago... not sure.
10. I believe that $form['#node']->nid and if ($form['#post']['op'] == t('Delete')) { is not the Drupal D6 way...
11. Check doesn't use HEAD request method... maybe by design, but this is not the best idea for check performance reasons.
12. Status code 0 doesn't mean you are not connected to the internet. It could also mean the port of the remote machine is down. For e.g. you check https://example.com, but you cannot connect to port 443 or there is no certificate installed. This status code could have many many other reasons...
13. '</div><div class="clear-block"></div>' could cause theming issues in themes if the theme use a different class name.
14. Reading some of composed SQL statements is really difficult... I hope nobody would ever need to debug this.

Aside - why is this project not merged with Links package?

#1

hass - August 3, 2009 - 19:26

15. Usability - never use radio boxes for Enabled/Disabled or Yes/No - checkboxes are much more better... seen on page admin/settings/weblinks

#2

hass - August 3, 2009 - 19:46

16. An URL like http://example.com/weblinks have a last_status = 200 (saved in database), I'm getting an 404 status code.

#3

hass - August 3, 2009 - 19:51

URL field in a weblinks node is of type AREA and not INPUT. Therefore it's possibel to type more than one line like:

http://example.com/weblinks1
http://example.com/weblinks2

This becomes the link in output: http://example.com/weblinks1http://example.com/weblinks2

#4

NancyDru - August 3, 2009 - 22:55

We do use the Coder module to check against core guidelines (which the 22 flagged errors in Linkchecker shows it does not) and it passes.

Since you will not comment on the code style bugs and Coder has no problems, I can't fix them. I know that we follow the coding standards.

Checking for a max of 50, in my experience, would almost certainly result in a PHP timeout. However, if we switch to Linkchecker (and this criticism is not a good start), this would go away anyway, so why comment on it.

5. '%d' came from a large patch from one of our dedicated users and slipped by our review. I realize that Drupal considers it wrong, but I have had occasion to benchmark it on a non-Drupal site and it turns out that it is actually much faster to use the quotes. But, again, this would potentially go away.

2/6. We have contributed translations (despite double blanks and line breaks). Potx does support line breaks according to my communication with Gabor. If you can point out the double blanks, I would be happy to fix them.

7. This is left up to the developer's discretion according to the standards. We typically use double quotes on queries simply because they often contain single quotes; otherwise we use single quotes expect to make some strings with substitution easier (and load/compile faster).

8. I have no idea what this means. This is in code that would rarely ever be executed and would be seen only by an admin. Also it would not be in the stuff you would be replacing.

9. You are correct, I need to include the link. I know for a fact that it has not yet been fixed and, in all likelihood, will not be fixed in D6.

10. Educate me, please, because I saw that same kind of stuff in core modules just this morning.

11. I have no idea what you are talking about.

12. The list came from W3C.

13. "Clear-block" is a core standard class. No one has yet reported a problem with it. However, I am open to suggestions.

14. So tell me your suggestions...

It is not merged with Links for several reasons:

  1. The maintainer of Links has not responded to our attempts to communicate.
  2. We had 6.x support long before he did.
  3. We happen to think this is a better package.
  4. We do have a converter for those who want to move to Web Links. As a matter of fact, we have converters for all the major links packages. And I took over and provided a few last fixes for the LinksDB module then told the users that they gained a lot and lost nothing to convert.
  5. As you complain here, I find the Links package extremely difficult to follow internally, not to mention many Coder errors.

#1 We respectfully disagree. A single check box is not very clear, nor does FormsAPI render it consistently. Yes/no radios much more clearly show the user whether the option is in use. Core usage to the contrary, we will not change this.

#2 I will have to check on this. I know that http://example.com is properly handled.

#3 I don't see INPUT as valid in http://api.drupal.org/api/file/developer/topics/forms_api_reference.html...
Further, it would then fail validation.

#5

hass - August 5, 2009 - 21:01

I've ran coder on weblinks_checker and it gave me not many things, but also the ones I've seen in the review. It's mostly indention and tabs, leading/trailing spaces and so on. Only as side note - as you ran coder on linkchecker... *none* of the reported complains shown for linkchecker is correct. Two eyes need to verify these false alarms. This are all bugs of coder module... it's not perfect and I'm not sure if it sees everything that is not the core code style in weblinks :-). I haven't had the time yet to report the linkchecker coder bugs... #539984: Only invalid errors reported on linkchecker 2.3

Checking for a max of 50, in my experience, would almost certainly result in a PHP timeout.

My experience with link checker and a slow DSL line is - ~110-120 checks are realistic in normal cases within 240 seconds (default cron PHP timeout). The reason that you cannot check more than 50 may be that you are using the GET request method and not HEAD. But there is no guaranty that this high number is working well with drupal_http_request() and this is why the number of checked links depends on processing time in linkchecker since v2.1 + you have no idea how much time other modules eats. It also removed user complains that cron have failed and so on... I hope to get the cURL support implemented somedays... and than we can check let's say 1000 links in parallel per second if the hardware can handle this... I only need to find a site with 1.000.000 or more links to test :-)))

#5. It's not the Drupal way and may prevent database portability to other DB server types like MsSQL, Oracle and so on. See the SQL coding conventions, http://drupal.org/node/2497

#2/6. I've seen 2 or 3 double spaces... cannot say the line, but they are in there. If you find one place in D6 core about strings with line breaks - point me to it I will create a patch asap. Core splitted complex and extra long strings into multiple strings. For e.g.

Wrong:

t("This is an example.")

t('This is an example.
With another sentence
and another one.')

Correct:

t('This is an example. With another sentence and another one.')

$output = t('This is an example.')
$output .= t('With another sentence and another one.')

#7. Yes, but it's faster to use single quotes... I saw a few translatable strings surrounded with double quotes, but there was no single quote inside - so no need to use double quotes.

#8. The string 'Weblinks moved !links from user !from to user !to.' is one example of a context sensitive string that may be difficult if not impossible to translate in other languages for the reason that translators are not able to see in what context a string stands.

Wrong:

<?php
          drupal_set_message
(t('Weblinks moved !links from user !from to user !to.',
            array(
             
'!links' => format_plural($count, '1 link', '@count links'),
             
'!from' => $account->uid,
             
'!to' => $move_to))
              );
?>

Correct (untested - cannot verify now):

<?php
drupal_set_message
(format_plural($count, 'Weblinks moved 1 link from user @from-uid to user @to-uid.', 'Weblinks moved @count links from user @from-uid to user @to-uid.', array('from-uid' => $account->uid, '@to-uid' => $move_to)));
?>

See http://api.drupal.org/api/function/t/6 or the translation guide pages for more information.

#10. I'm not 100% sure, but I thought we use $form_state['values']* array only. There are a few exceptions, but this depends on the use case...

#11. About HEAD vesus GET, see http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol, please.

#12. You shouldn't trust the W3C rules - if PHP is the engine the code relies on. See http://www.php.net/manual/en/function.fsockopen.php

If the value returned in errno is 0 and the function returned FALSE, it is an indication that the error occurred before the connect() call. This is most likely due to a problem initializing the socket.

#15

We respectfully disagree. A single check box is not very clear, nor does FormsAPI render it consistently. Yes/no radios much more clearly show the user whether the option is in use. Core usage to the contrary, we will not change this.

Everyone knows very well that: checked boxes means enabled and unchecked means disabled. I would wonder if there is one place in core or one of the top 10 modules having such radios... aside we also changed WYSIWYG and it was planed for FCKEditor module to change from yes/now to checkboxes, too - for usability reasons. I have never seen a Windows or Linux Desktop Software using radios for enable/disable features. The UI looks very overloaded to me with yes/no radios. It's also easier to develop/maintain for you, too.

#16

I will have to check on this. I know that http://example.com is properly handled.

Yes it seems to be handled, but 200 is not the status code you receive from the example.com server... that's save to say.

#17. INPUT in Drupal means TEXTFIELD, see http://api.drupal.org/api/file/developer/topics/forms_api_reference.html...

Thank you for information about Links package... I wonder - Scott answered to me very quickly and he is also looking for co-maintainers... but I do not care much... I only wondered why people re-invent the wheel. Not my time... :-)

Only as a side note - I have added Weblinks node type support to linkchecker yesterday - whatever you plan/decide to do with the checker module. This change allows the linkchecker users to see all broken links in only one report what should be best - usability wise. But it doesn't handle moving links to other users...

#6

hass - August 12, 2009 - 21:37

Only as an additional note - if the download method GET is used and someone adds a link to a file download - let's say a 500 MB ZIP file - the module currently starts *downloading* this file with drupal_http_request(). This will fail after 15 seconds... (default timeout) and end up in a link that is marked as broken by the module, but it isn't it. Method HEAD will only check if the link to the file is OK, but do not download the ZIP file themself.

#7

NancyDru - August 12, 2009 - 23:09

Yes, I read the doc you sent me to and will be changing the module when I have time.

#8

jonathan1055 - August 16, 2009 - 16:00

Hello Hass,
Nancy has covered nearly everything, but I have just a few observations and additions:

1. $response->error may not be saved in database for the reason of encoding issues. See linkchecker code.
I don't think Nancy made any comment on this. We do not save the error in the weblinks table. Are you talking about the watchdog table?

2. There is only one place with a double space in a t string and this is after the '?' in the description for $form['weblinks_error_handling']['weblinks_checker_action_on_unpublished']. There are no double spaces in any translatable strings in .module

3. Code indentation. There is one line in .module (in weblinks_checker_cron where $extra is assigned) which has an extra space causing the indentation to be mis-aligned. I have not seen any others.

5. Single quotes around %d - there is only one occurrence of this, and is in my addition in weblinks_checker_nodeapi, so I admit to this error (but I think I copied from somewhere else).

6a. We have a database abstraction layer, please use it. Don't do this in D6: ... WHERE type='weblinks' ... or ... n.status=0....
You had two points labelled 6, so this may be why this one was overlooked. Could you explain what you would like to see instead of the SQL above?

6b. Line break within t( ) - there is only one place in admin.inc which has this, in desc of form['ignore'] in weblinks_checker_settings()
There are none in .module.

7. Double quotes unnecessary - There are 2 in hook_user, 1 in hook_nodeapi, 2 in hook_validate, 2 in hook_cron. In .admin.inc there are two translatable strings which only use double quotes.

12. Return code 0. The point is that 0 was not originally in the ignore array but was recently added in my enhancements. If you have a better concise description to be shown on the admin page, please suggest it. I take your point that error 0 could be for various reasons, so the comment in the source code could be altered to reflect this.

16. Checking http://example.com/weblinks I have just tried this - If you do not have 'check link when adding' enabled, then the link is added to the db with status 200. On the first cron run this is changed from 200 to 404, and 404 is saved in the db.

Nancy, would you like me to make a patch for the obvious fixes (ie 2, 3, 6b and 7 above) so that we can start with a fully compliant and clean version before you go on to make any real code changes (if any). If you would like me to make the change to remove single quotes around %d in 5 then I can do this too. There are some long unbroken strings for other form descriptions, which could be split into shorter translatable strings, but I would defer to your opinion on these.

Jonathan

#9

hass - August 21, 2009 - 06:40

#1. I'm saving this in the DB... the server could answer with a defined status code, but the message is different... I cannot remember if I have verified in past if watchdog also have issues to save... but this could happen.

#2 very minor :-)

#3. Not only one line... I would guess - over 100 - for e.g. weblinks_checker.admin.inc

Wrong

  $form['weblinks_allow_dupes'] = array(
    '#type' => 'radios',
    '#title' => t('Allow duplicate URLs'),
    '#default_value' => variable_get('weblinks_allow_dupes', 0),
    '#options' => array(1 => t('yes'), 0 => t('no'), 2 => t('warn')),
    '#description' => t('This check is performed at link creation time. Most users will not want to allow duplicate URLs.'),
    '#prefix' => '<div class="weblinks-radios">',
    '#suffix' => '</div><div class="clear-block"></div>',
    );

Correct:

  $form['weblinks_allow_dupes'] = array(
    '#type' => 'radios',
    '#title' => t('Allow duplicate URLs'),
    '#default_value' => variable_get('weblinks_allow_dupes', 0),
    '#options' => array(1 => t('yes'), 0 => t('no'), 2 => t('warn')),
    '#description' => t('This check is performed at link creation time. Most users will not want to allow duplicate URLs.'),
    '#prefix' => '<div class="weblinks-radios">',
    '#suffix' => '</div><div class="clear-block"></div>',
  );

#6a. WHERE type='%s' ... or ... n.status=%d....", 'weblinks', 0)

#6b. Yes, If I see one I expect more :-). Keep in mind I have only reviews the checker module.

     '#description' => t('The validity checker will treat all the selected codes as OK.
      For more information, see <a href="!url" target="_blank">this page</a>.',

16. You do not need to check example.com URLs. This makes no sense... they are always 404. I decided to keep them unchecked in linkchecker.

#10

hass - August 21, 2009 - 06:42

~110-120 checks are realistic in normal cases within 240 seconds

Correction: ~110-120 checks are realistic in normal cases within 120 seconds using HEAD method. :-)

#11

NancyDru - August 27, 2009 - 14:19

2. Actually, the English business style guides call for double spaces between sentences, so I often accidentally follow that practice.

3. Even the standards allow this as a matter of preference. I prefer array closing parentheses to align with the elements. You will find some examples of this in core and major contribs. Coder will not flag it.

6a. I have seen the practice of moving hard-coded values to the values. I think that is hogwash! You complain about the insignificant performance implications of single vs double quotes, and yet you want to accept the performance impact of value substitution? I will not "fix" that.

7. There have been many debates over the single/double quote "issue" on queries and the best advice that has been given so far is that it is a matter of personal taste. So my preference, considering how often single quotes are used inside queries is to always use double quotes. The performance is so minuscule that it is easier to avoid any potential maintenance impact by doing this. The only double quotes I see in hook_validate are proper English punctuation.

16. How many people are really going to put example.com in their Web Links? It is not worth the effort or performance impact to check for this special case.

Thanks for the offer, Jonathan, but I have already committed the changes that I will make.

I will be switching to the "head" method.

#12

NancyDru - August 27, 2009 - 15:37
Status:active» fixed

HEAD method committed, limit changed to 150.

#13

hass - August 27, 2009 - 17:56
Status:fixed» needs work

3. I only know the rules from http://drupal.org/coding-standards#array. If you can find your variant in D6 core - it must be a code style bug.

6a. Proper escaping the most important here. If you know other documentation in d.o, please provide me a link that proves this is not a bug.

Preventing SQL injection is easy; db_query provides a way to use parametrized queries. Drupal's database functions replace the sprintf-like placeholders with the properly escaped arguments in order of appearance.

7. It's for database portability. I'm currently not able to find the written rule for double quotes around the complete SQL statement... The general rule is: db_query("SELECT foo FROM {table} WHERE foo = '%s' AND bar = %d"). Also see http://drupal.org/node/2497 for the SQL coding rules,

Any string literal or %s placeholder must be enclosed by single quotes: ' . Never use double quotes.

16. People writing documentation.

#14

NancyDru - August 27, 2009 - 18:09
Status:needs work» closed

3. While the example shows it that way, notice that it is not specified as a part of the standard. That makes it "personal preference."

6a. Tell me how it is not properly escaped, or how injection attacks are possible.

7. And that's what I do. BTW, notice in the example on that page that the values are hard-coded rather than being moved to the arguments.

#15

hass - August 27, 2009 - 20:32
Status:closed» needs work

Have you taken a look to the links I've posted above?

3. This is not personal preference. It's a code style bug. I cannot see any comment that something is personal preference here and that different indention is allowed. Please show me any line of code in core that have such an indention with the closing bracket. I'm very interested to see the line...

6a. It's clearly written Never use double quotes. If I remember you said it's ok to have db_query('SELECT foo FROM {table} WHERE foo = "%s" AND bar = "%d"'). This is incorrect and a bug -> Never use double quotes and also never surround integers with quotes! This are bugs not personal preference - if you can point me to a wrong documentation page - I would be happy to fix this page.

7. No, I've seen single/double quotes around %d. This may not DB independent and is not correct as it makes the SQL server to do string to integer conversion. This is always wrong and takes processing time -> bug.

#16

NancyDru - August 27, 2009 - 20:39
Status:needs work» closed

6a. That's not what I said at all. The quotes in your example are backwards and I do not quote integers.

This issue is closed until such a time as the Coder module identifies the things you claim as bugs. I suggest you open an issue there.

#17

hass - August 27, 2009 - 21:32

Coder will NEVER be able to identify all bugs or coding style issues. Do not trust this buggy tool.

#18

hass - August 27, 2009 - 22:55
Priority:normal» critical
Status:closed» needs review

1. There are still yes/no radios where checkboxes should be and the text still talks about "Check". I hoped to see only one reference example that proves that this yes/no is not bad usability and if you have found one of such radio on linux/windows/drupal core or any other of the top 10 modules on d.o.

  $form['weblinks_validate_check'] = array(
    '#type' => 'radios',
    '#options' => $yesno,
    '#prefix' => '<div class="weblinks-radios">',
    '#suffix' => '</div>',
    '#title' => t('Check if link is valid when entered'),

2. Here I've also found a context sensitive bug. This need to become two translatable sentences. One with static "nothing" and one with the number. Otherwise it may not translatable correctly as nothing have many meanings. No fix in the attached patch!

    form_set_error('weblinks_checker_unpublish_after', t("You entered @unpublish_after for 'unpublish after'. Please use a number in the range 1 to 99.", array('@unpublish_after' => ($unpublish_after == '' ? t('nothing') : $unpublish_after))));

3. weblinks_checker_validate() does a parse_url, but doesn't add username, passwort and custom ports and other items back. Take a look to drupal_http_request() to get an idea.

4. Use db_placeholders() for this statement only. http://api.drupal.org/api/function/db_placeholders/6 and NEVER use doubles quotes inside the SQL string as written in the Drupal docs.

     $sql .=  ' node_published ASC, (case when (l.last_status in ("'. implode('","', $good_status_list) .'")) then " " else l.last_status end) DESC,';

5. Security bugs: Current code allows the remote server to inject malformed code that should be executed in watchdog views! I will inform security team about this asap.

Correct:

        $newurl = substr($response->redirect_url, 0, 1) == '/' ? $url . $response->redirect_url : $response->redirect_url;
        watchdog('Weblinks', '@url updated to @new.', array('@url' => $url, '@new' => $newurl));

      watchdog('Weblinks', '@url reports @status @extra.', array('@url' => $url, '@status' => $status, '@extra' => $extra));

The attached patch fix not all of the above issues. NOTE: this patch is NOT tested at all. All fixes are made codewise. Maybe I missed a bracket... not sure.

AttachmentSize
weblinks_checker-bugs-D6.patch 18.78 KB

#19

hass - August 27, 2009 - 22:51

#20

hass - August 28, 2009 - 12:15

Commit http://drupal.org/cvs?commit=256930 is invalid! Remove check_plain from @variable placeholders.

watchdog('Weblinks', '@url reports @status @extra.', array('@url' => check_plain($url), '@status' => $status, '@extra' => $extra));
watchdog('Weblinks', '@url updated to @new.', array('@url' => check_plain($url), '@new' => $newurl));

#21

hass - August 28, 2009 - 12:20

Looking on the commit messages it looks like nearly non of the above reported bugs and code style issues are fixed.

#22

NancyDru - August 29, 2009 - 01:46
Status:needs review» closed

#18:
1 - This is what the module owner wants; that's the way it will be. Period. [BTW, I agree with him.]
2 - fixed
3 - If it did, valid_url will fail because of bugs that have been reported. I don't have the time at them moment to look up the issue numbers. Currently, the primary value of valid_url is to check if a WYSIWYG editor has messed with it. However, a recommended setting then will go through a full drupal_http_request.
4 - fixed
5 - security issues should never be disclosed publicly. fixed.

#20:
Those were redundant messages and Jonathan already fixed them.

We may disagree on the interpretation of the Coding standards, but until our code gets flagged, our understanding will remain as is. End of discussion.

#23

rmiddle - August 29, 2009 - 03:14

hass,

This is getting borderline abusive. This thread is being closed. We have reviewed you suggesting and found a few did need fixed. Thanks for that but other are just our prefered way of doing thing. Please respect that and drop the topic.

Thanks
Robert

#24

hass - August 29, 2009 - 10:20

I'm sorry for the many many bugs and code style issues in this module. I'm also not happy to review such a small submodule and find sooo many issues. If it's "borderline abusive" to report 100 issues and to verify if something has been committed than it's how it is. If you say this for the radio usability complain, than it's also as is - checkboxes are 1000 times better. This is my last comment on this thread.

Most of the changes are not for me - they are for all translators.

Please capitalize the below translatable strings correctly. This was also in my patch and does not make translators to maintain two strings of yes/Yes and no/No:

Correct

'#options' => array(1 => t('Yes'), 0 => t('No'), 2 => t('Warn')),

 
 

Drupal is a registered trademark of Dries Buytaert.