Comments

mgifford’s picture

subscribe. This is a useful module for checking major upgrades.

yukare’s picture

Those issue #562694 will not be fixed for drupal 7.
I really like this module and find it usefull. Anyone working on a drupal 7 version? I will start working on it.

hass’s picture

Code is in HEAD... but only 2/3 ported and it does not work. I definitely need help how to read the new "fields". The logic have changed and I have not found out how the APIs works. The remaining rest is not that difficult to upgrade, but if I cannot solve the "fields" stuff I cannot work on the rest.

yukare’s picture

I am working and at this weekend will make a patch to you review.
The code in the head is not 2/3 as you said, it is much more ready than this.
Now i can make it extract the links from body fields (and will be simple add other types of fields), run cron to see broken links, edit the broken links.
Just wait the weekend and we will have a working version(incomplete).

This is the code to list fields and get the value:

 $field_list = field_info_fields();
  foreach($field_list as $name => $field) {
    if (in_array($node->type, $field['bundles']['node'])) {
      switch($field['type']) {
        // this is the case of body field
        case 'text_with_summary':
          // this is because a php error
          $nodeField = $node->$name;
          $text_items[] = _linkchecker_check_markup($nodeField[$node->language][0]['value'],
            $nodeField[$node->language][0]['format'], $node->language);
          $text_items[] = _linkchecker_check_markup($nodeField[$node->language][0]['summary'],
            $nodeField[$node->language][0]['format'], $node->language);
          break;
      }
    }
  }
yukare’s picture

StatusFileSize
new21.04 KB

As promissed, the patch is my current work:
* Updated the tests for drupal 7, and all they pass now.
* Can check the links from the body of a node and from comments.
* Can view the report of broken links, and the cron task to verify which links are broken is working

Need to do:
* Test the upgrade from drupal 6, i did not change the database, so maybe it works without any change
* Need to add another fields types to parse in the content(easy task)
* Need to work on the changing link in the case of a 301 redirect( i did not touch this part yet)
* Need work on node umpublish when the link report as 404 error( i did not touch this part yet)

What i want:
* Please create a drupal 7.x branch on cvs, so will be possible create tasks/bug reports for it.
* Review the patch, it is not ready to commit direct on cvs, because it may have some dpm call and things like this, when i clear my to do list, i will provide a patch ready to commit.
* Report anything that is not in my to do list above.

yukare’s picture

I will do a patch latter, so this is a update from last patch:

* Convert all md5() to sha1 - Done, but this cause the lost of upgrade path, so current you have to uninstall(really uninstall, not only disable) the module on drupal 6 and after upgrade, install it again on drupal 7
* Change the link on 301 redirect - Working
* Umpublish the node on 404 error - Working

hass’s picture

We can run an update hook to recalculate all sha1 hashes from the URLs. This can for sure be done without a reinstall, but requires some patience on the upgrade (may be a very long batch process), or we can do this in the background via cron. Otherwise we could set all links to require immediate re-check. I'm not very happy about both, but staying with md5 seems no option for some US government rules.

I currently prefer a batch process in an update hook.

yukare’s picture

StatusFileSize
new41.68 KB

New patch:
* No more warning on watchdog, report if there is any for you
* Add The patch at #1021384
* Search for links in text_with_sumary, text_long fields and replace in case of 301 redirect. Search but do not replace in fields 'text'
* Everything must be working as is in drupal 6 version

To do:
* Upgrade path :(
* Improve the tests(lot of things without a test)
* Replace the links in case of a 301 in field 'text' (must take care with the size)
* Write documentation using advanced_help format ?

Possible new feature:
What you think about parse user and taxonomy fields?I will work on this after we have a 7.x.1.0 stable, with the other new features i want and there is open issues for it(ftp link validation, views integration(upgrade existing patch to views 3)).

hass’s picture

Remove #1021384 from code! Do not add new features! I do not like to use advanced_help module in any of my projects. Translation support is awful and there is nothing complex to document.

hass’s picture

Looks like all core modules use 64 length for hashes... I'm not sure how long the new hashes could be. This is the upgrade code.

/**
 * Extend data length of {linkchecker_link}.urlhash field.
 */
function linkchecker_update_7001() {

  db_drop_unique_key('linkchecker_link', 'urlhash');

  $spec = array(
    'type' => 'varchar',
    'length' => 64,
    'not null' => TRUE,
    'description' => 'The indexable hash of the {linkchecker_link}.url.',
  );

  db_change_field('linkchecker_link', 'urlhash', 'urlhash', $spec, array('unique keys' => array('urlhash' => array('urlhash'))));

  return t('Extended data length of {linkchecker_link}.urlhash field.');
}

/**
 * Calculate new hash value for url hashes.
 */
function linkchecker_update_7002() {
  // TODO: Run chunks of 1000 with batch

  return t('Calculated new hash values for url hashes.');
}
hass’s picture

Incorrect change from #1040474: Report relative links as broken when drupal is out of webserver root dir needs to be rolled back. As said - only upgrade and don't change code style, blank lines or logics, please.

yukare’s picture

The size of hash is 43, but since core uses 64 must be a reason and i will keep at 64 then.
I will remove #1040474 since it was a error, sorry. It fixes a diff from my real site and my dev site, but will break things in another cases.
I can remove #1021384 but please consider its inclusion, it is a small path, all the tests passes with it, and it is a nice feature to have that will not change anything for who do not want use it.
Today i will do another patch with those changes.

hass’s picture

I'm very sure that #1021384: New link checker mode : internal only. will go in later, but for now we should upgrade only and than we can integrate other features. For review it would be much easier to review/commit smaller patches. For example I would love to review a patch that only change the md5 stuff. After we have the correct database we can go forward with cck/fields and the rest...

I have seen action.aid is 255 in size an saves the same hash like we. I'm very confused how big the field must really be... Maybe aid is much more bigger size than the data saved in this field. Maybe a core bug.

Why 43? Php docs say 40 on sha256. Why does base64 add 3 letters... Strange. Thought about using sha256 only...

yukare’s picture

StatusFileSize
new43.08 KB
new99.26 KB

* removed #1040474
* add upgrade patch from drupal 6, the only way to do this is in a upgrade hook, if we use cron, it will make the module unusable after installation until all cron is run.
* keep the hashes using drupal_hash_base64, using sha256 will not make any sensible difference, it can be used in urls if needed(maybe with views)

I will not restart everything and create a bunch of small patches now which will not work(because all those changes are needed to make the module work), so this is my last patch until there is a drupal 7 release, when i will work on ftp and views support.

This time there is a zip file with the module too.

Anonymous’s picture

subscribe

mike503’s picture

subscribing as a d7 port of this is VERY much desired!

could there be an initial dev release or alpha which doesn't have the "extra fields" support that is making things more complex, and that gets put in later once it's been implemented? might be a way to get users to start testing the module's basic functionality + scan the normal "body" content areas (unless those got pushed off into the field API too now)

i for one do not *need* an upgrade path for sites i'm starting on d7. i don't know if that breaks a social rule that any version must have an upgrade path, but perhaps if it was dev, the upgrade issues could also be "coming shortly" too?

yukare’s picture

I simple do not know what is missing for a dev release, current the module have everything that works on d6 version, have support for the most of fields, whicj is am improviment from d6, have an working upgrade path from d6.

"might be a way to get users to start testing the module's basic functionality + scan the normal "body" content areas (unless those got pushed off into the field API too now)"
Yes, everything in a node that is not the title is now a field, but this works on d7 port.

"i for one do not *need* an upgrade path for sites i'm starting on d7. i don't know if that breaks a social rule that any version must have an upgrade path, but perhaps if it was dev, the upgrade issues could also be "coming shortly" too?"
There is an upgrade path.

Since was about a month and the developer did nothing about this, i will create a sand box project and commit everything in small patchs as he requested, may be he can put it on a official d7 version, and i will work in add suport for the link module, views and ftp.

mgifford’s picture

Well, we're in an age now of git, so you can just clone the repository & work from there.

I looked at your zip file & compared it to the existing releases (dev & stable d6) but didn't see a neat correlation between the two.

Took a closer look using meld and would really like to get a better sense of how you created your patch so that I can apply it against an existing release.

Generally, there'd be a d6->d7 patch that could be reviewed compared with the existing code. The files are moved around so it's harder to do this. It would then be reviewed by a few people before being marked rtbc and brought out as a dev release.

Doesn't always work like this in contrib, and I have no idea if the maintainer is still involved or not. However, often sending an email to somewhere other than the issue queue can help.

Thanks for doing this work. In my review of the patch it's clear you've spent a bunch of time on this and it would be great to see it brought into a d7 release!

yukare’s picture

The path apply to cvs head(look at #3 and #4), now master in git, where there is a previous work started in porting to drupal 7, not in 6-dev or 6-stable.

hass’s picture

Could you guys stop stressing me, please? As said it's easier to review/test smaller patches than one monster patch. If you do not like to provide smaller patches the review may take much longer.

yukare’s picture

And the patch at #1021384 which is small and you did not commit? and the pathc about views which is important and you did not commit?
I will not work anymore on this, is a good module, but you do not care about it. I have views 3 and ftp working, but if you want i redo all the patch and make everything from start again.
I really to do not what is so hard to review in this path, and more because it is agains a now working version, as the drupal 7 port do not work on git.

hass’s picture

hass’s picture

Status: Postponed (maintainer needs more info) » Needs work

As said earlier - don't mix patches! Remove #1021384: New link checker mode : internal only..

hass’s picture

So, I've found the time to do a code wise review. This is a very big list. Patch cannot committed how it is today.

+++ linkchecker.info	31 Jan 2011 16:02:30 -0000
@@ -1,12 +1,16 @@
+; List of files
 files[] = linkchecker.admin.inc
 files[] = linkchecker.batch.inc
 files[] = linkchecker.install
-files[] = linkchecker.module
 files[] = linkchecker.pages.inc
-files[] = linkchecker.test
\ No newline at end of file
+files[] = linkchecker.module
+files[] = linkchecker.test

We need to remove all this entries if there are no classes defined in the PHP files. No idea why this order has been changed.

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -110,9 +110,9 @@ function linkchecker_schema() {
+        'length' => 43,

@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+function linkchecker_update_7001() {

linkchecker_update_7001 make it 64? This is inconsistent.

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
-  db_rename_table($ret, 'linkchecker_boxes', 'linkchecker_block_custom');
-  db_rename_table($ret, 'linkchecker_comments', 'linkchecker_comment');
-  db_rename_table($ret, 'linkchecker_nodes', 'linkchecker_node');
-  db_rename_table($ret, 'linkchecker_links', 'linkchecker_link');
+  db_rename_table('linkchecker_boxes', 'linkchecker_block_custom');
+  db_rename_table('linkchecker_comments', 'linkchecker_comment');
+  db_rename_table('linkchecker_nodes', 'linkchecker_node');
+  db_rename_table('linkchecker_links', 'linkchecker_link');

Looks like this patch is not against latest DEV.

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+function linkchecker_update_7004(&$sandbox) {

After 7001 the next update hook number is 7002, not 7004.

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+  ¶

Trailing spaces need to be removed.

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+    $links = db_query('select count(*) from {linkchecker_link}')->fetchField();

count(*) is VERY expensive for INNODB and should always be written uppercase with the column name e.g. COUNT(lid)

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+  $result = db_query_range('select url, lid from {linkchecker_link}', ($sandbox['chunk'] * $size), $size);

Code style is not good. SQL upercase...

+++ linkchecker.install	31 Jan 2011 16:02:30 -0000
@@ -218,15 +218,69 @@ function linkchecker_modules_disabled($m
+    db_query('update {linkchecker_link} set urlhash = :hash where lid = :lid', array(':hash' => $hash, ':lid' => $row->lid));

Code style...

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -116,8 +116,8 @@ function linkchecker_menu() {
-  $items['linkchecker/%linkchecker_link/edit'] = array(
-    'access arguments' => array('edit link settings'),
+  $items['linkchecker/%/edit'] = array(
+    'access arguments' => array('Edit link settings'),

Why has this been changed? %linkchecker_link refences to function linkchecker_link_load($lid)

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -190,7 +190,6 @@ function _linkchecker_status_handling($l
-    case 200:
     case 304:

Patch is difficult to read, but 200 is handled like 304. Patch looks there is a difference made now. Rollback.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -204,71 +203,43 @@ function _linkchecker_status_handling($l
+        ¶

Trailing space

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -279,23 +250,15 @@ function _linkchecker_status_handling($l
+            // replace links on subject

"Replace links in subject." Don't be lazy, make periods if sentences are finished and start with uppercase first.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -414,6 +378,51 @@ function _linkchecker_status_handling($l
+ * ¶
+ * @param Objetc $entity
+ *        The object we are working on, can be a $node, $comment
+ *  ¶
+ * @param String $entityType
+ *        The type of entity, like $node->type or $comment->node_type ¶
+ * ¶
+ * @param String $bundle
+ *        The type of bundle like 'node' or 'comment' ¶
+ * ¶
+ * @param String $oldURL
+ *        The previous url
+ * ¶

Trailing spaces and code style. Remove empty lines and make periods.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -414,6 +378,51 @@ function _linkchecker_status_handling($l
+      // this is because a php error

Text style

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -436,10 +445,28 @@ function linkchecker_node_prepare($node)
- * Implement hook_node_publish().
+ * Implement hook_node_insert
+ *
+ * @param object $node The node being inserted
+ *
+ * This function happens when a new node is inserted, so parse it and search
+ * for links

Unpublished nodes will not checked for broken links! We don't do the extraction on insert.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -556,7 +583,13 @@ function linkchecker_block_custom_delete
+  ¶

Trailing spaces.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -671,6 +670,40 @@ function _linkchecker_add_node_links($no
+function linkchecker_parse_fields($entity, $type, &$text_items) {

Needs documenatation what the function does and how it works.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -671,6 +670,40 @@ function _linkchecker_add_node_links($no
+    // the followning line will always cause a warning, because only
+    // of the two options will be set, so the @
+    if (@in_array($type, $field['bundles']['node']) || @in_array($type, $field['bundles']['comment'])) {
+      // this is because a php error

Text style

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -671,6 +670,40 @@ function _linkchecker_add_node_links($no
+              $text_items[] = _linkchecker_check_markup($item['value'],
+                $item['format'], $entity->language);
+              $text_items[] = _linkchecker_check_markup($item['summary'],
+                $item['format'], $entity->language);

Code style. Don't make newlines.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -671,6 +670,40 @@ function _linkchecker_add_node_links($no
+              $text_items[] = _linkchecker_check_markup($item['value'],
+                $item['format'], $entity->language);

Codestyle: No newline

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -780,15 +813,25 @@ function _linkchecker_add_comment_links(
+  // Remove a warning if it is not defined
+  if (isset($block_custom->info)) {
+    $text_items[] = _filter_url($block_custom->info, $filter);
+  }
+  if(isset($block_custom->title)) {
+    $text_items[] = _filter_url($block_custom->title, $filter);
+  }
+  // $block_custom can come from a scan on nodes and blocks or from editing a block
+  // and the object is not the same from both sources

Text style.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -780,15 +813,25 @@ function _linkchecker_add_comment_links(
+  } else {

Code style

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1168,21 +1211,23 @@ function _linkchecker_extract_links($tex
+    $mode = variable_get('linkchecker_fqdn_only', 1);
+    // Full qualified URLs : $mode = 0 or 1
+    if ($mode != 2  && valid_url($url, TRUE)) {

Don't mix up patches! Roll this out!

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1168,21 +1211,23 @@ function _linkchecker_extract_links($tex
+    // Local URLs : $mode = 0 or 2
+    if ($mode != 1 && valid_url($url, FALSE) ) {

Patches mixed.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1168,21 +1211,23 @@ function _linkchecker_extract_links($tex
+        global $base_path;

???

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1213,9 +1258,8 @@ function _linkchecker_extract_links($tex
-    }
+	}

Wrong indention?

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1333,12 +1378,13 @@ function _linkchecker_link_replace(&$tex
+function _linkchecker_check_markup($text, $format, $langcode = '', $cache = TRUE) {

Make sure this is a 100% clone of core with exclusion added. Unverified yet.

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1361,10 +1406,14 @@ function _linkchecker_check_markup($text
+        //run only enabled filters, not all

Text style

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1361,10 +1406,14 @@ function _linkchecker_check_markup($text
+            //$text = call_user_func($filter_info[$filter->name]['prepare callback'], $text, $format, $langcode, $cache_id);

Duplicated line

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1372,10 +1421,14 @@ function _linkchecker_check_markup($text
+        //run only enabled filters, not all

Text style

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1372,10 +1421,14 @@ function _linkchecker_check_markup($text
+              //$text = call_user_func($filter_info[$filter->name]['process callback'], $text, $format, $langcode, $cache_id);

Duplicated line

+++ linkchecker.module	31 Jan 2011 16:02:31 -0000
@@ -1540,7 +1593,8 @@ function _linkchecker_unpublish_nodes($l
+    watchdog('linkchecker', 'Set @type %title to unpublished.', array('@type' => $node->type, '%title' =>$node->title),
+       WATCHDOG_NOTICE, l($node->title, 'node/' . $node->nid));

Remove line break!

+++ linkchecker.pages.inc	31 Jan 2011 16:02:32 -0000
@@ -119,7 +119,7 @@ function _linkchecker_report_page($query
-  $access_edit_link_settings = user_access('edit link settings');
+  $access_edit_link_settings = user_access('Edit link settings');

"edit link settings" is an internal permission name. Need to be kept lower-case.

+++ linkchecker.pages.inc	31 Jan 2011 16:02:32 -0000
@@ -173,14 +173,15 @@ function _linkchecker_report_page($query
+        // Must set attributes and title as empty values because there is a
+        // error and a warning if do not set

Period missing.

+++ linkchecker.pages.inc	31 Jan 2011 16:02:32 -0000
@@ -173,14 +173,15 @@ function _linkchecker_report_page($query
+        theme_item_list(array('items' => $links, 'type' => 'ul', 'attributes' => array(), 'title' => "")),

Never use double quotes if not absolutely required > code style.

+++ linkchecker.pages.inc	31 Jan 2011 16:02:32 -0000
@@ -195,12 +196,11 @@ function _linkchecker_report_page($query
-function linkchecker_link_edit_form(&$form_state, $link) {
-
+function linkchecker_link_edit_form($vars, &$form_state, $link) {
+  $link = db_query('SELECT * FROM {linkchecker_link} WHERE lid = :lid', array(':lid' => $link))->fetchObject();
   $form['settings'] = array(
     '#type' => 'fieldset',
     '#title' => t('Settings'),
@@ -239,7 +239,7 @@ function linkchecker_link_edit_form(&$fo

@@ -239,7 +239,7 @@ function linkchecker_link_edit_form(&$fo
     '#default_value' => 0,
     '#type' => 'checkbox',
     '#title' => t('Re-check link status on next cron run'),
-    '#description' => t('Enable this checkbox if you need an immediate re-check of the link and cannot wait until the next scheduled check at @date.', array('@date' => format_date($link['last_checked'] + variable_get('linkchecker_check_links_interval', 2419200)))),
+    '#description' => t('Enable this checkbox if you need an immediate re-check of the link and cannot wait until the next scheduled check at @date.', array('@date' => format_date($link->last_checked + variable_get('linkchecker_check_links_interval', 2419200)))),

Looks like a change that is cause by the wrong menu change.

+++ linkchecker.test	31 Jan 2011 16:02:32 -0000
@@ -247,6 +249,7 @@ EOT;
+    ¶

Trailing space.

+++ linkchecker.test	31 Jan 2011 16:02:32 -0000
@@ -254,17 +257,18 @@ EOT;
+    $this->verbose(theme_item_list(array('items' => $rows, 'type' => 'ul',
+      'attributes' => array(), 'title' => "URLs in database:")));

Invalid newline.

+++ linkchecker.test	31 Jan 2011 16:02:32 -0000
@@ -285,7 +289,8 @@ EOT;
+    $this->verbose(theme_item_list(array('items' => $urls_relative, 'title' => 'Verify if following relative URLs exists:' ,
+      'attributes' => array(), 'type' => 'ul')));

Invalid newline.

Powered by Dreditor.

yukare’s picture

Let me start it again.
I have created a sand box project at http://drupal.org/sandbox/yukare/1106574 now that we have git on drupal.org and will work on a patch there, once the patch is ready, you commit it and i remove/abandon that sandbox.
I will make all changes from d6 to d7 in many small commits on the sandbox, one for each change, so will be easy to review but put only a full patch here, is that ok for you?

I will fix all codding style things. One of it, i was breaking long lines into two, if this is wrong i will not do it anymore.

Some questions about your review:

Looks like this patch is not against latest DEV.

The patch is against head, as you said in #4, do you want it against last 6.x-dev ?

After 7001 the next update hook number is 7002, not 7004.

Sure, and i need to change it. It was because i was working on the update, and once a 7002 update is done on drupal, i can not run it again, so i changed to 7003 and 7004, and forgot to change it again to 7002 before make the path.

We need to remove all this entries if there are no classes defined in the PHP files. No idea why this order has been changed.

Ok.

count(*) is VERY expensive for INNODB and should always be written uppercase with the column name e.g. COUNT(lid)

I will fix this and all other coding syntax in sql code. SQL functions must be uppercase. There is any other way to count records that is best for INNODB?

Patch is difficult to read, but 200 is handled like 304. Patch looks there is a difference made now. Rollback.

This is missing documentation and i will make this change clear: drupal send the message 200 for redirected code 301. If a site redirect to another place (301 code) and drupal can read the new address, drupal returns a 200 code(not a 301)(maybe a d7 change) AND a redirect_status of 301

Unpublished nodes will not checked for broken links! We don't do the extraction on insert.

There is not publish/umpublish on drupal 7, only save/delete, so i can test on save if it is published when the node changes.

Needs documentation what the function does and how it works.

Ok. It really need documentation. Since fields on the core, the way to get information on a field for a content type/block/comment(may be we could verify user profile fields in future) is near the same, so i moved it to a function.

yukare’s picture

"Looks like this patch is not against latest DEV."

Now i understand what was wrong. There was some commits on HEAD(now master on git) after i started to work on the path.

hass’s picture

http://drupal.org/node/360052 and subpages are the forcing rules. Nobody needs a sandbox, but a very clean patch is absolutly required. I will not commit patches where every 2/3 lines have code style issues or if the patch adds other new bugs or mixes patches that are under discussion like #1021384: New link checker mode : internal only.. Bugs need to be whiped out before a patch gets committed - not afterwards. As suggested, smaller patches may make the review easier - at least if I need to comment every 2/3 lines.

yukare’s picture

StatusFileSize
new7.18 KB

As requested, a small patch, it is not the full port to drupal 7, but one task: parse all nodes content, get the urls and add it to database. To test it, just press the buttom "Clear link data and analize content for links" on admin page.
I hope i fixed all issues you pointed in this part of the code.
This patch is build on master git branch using "git diff" so apply it with "git apply [patch.name]".

hass’s picture

Please don't attach patches with numbers on the end of they will be ignored... :-( Buggy testing infrastructure...

+++ b/linkchecker.module
@@ -568,44 +568,10 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+  ¶

Trailing whitespace

+++ b/linkchecker.module
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+ * ¶
...
+ * ¶

Trailing spaces need to be removed. See Drupal code style. I suggest to enable auto cleanup trailing white space in your editor...

+++ b/linkchecker.module
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+    if (@in_array($type, $field['bundles']['node']) || @in_array($type, $field['bundles']['comment'])) {

Why is there an @? We write E_ALL save code and this may hide possible bugs. It must be avoided under all possible circumstances.

+++ b/linkchecker.module
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+              $text_items[] = _linkchecker_check_markup($item['value'],
+                $item['format'], $entity->language);
+              $text_items[] = _linkchecker_check_markup($item['summary'],
+                $item['format'], $entity->language);
...
+              $text_items[] = _linkchecker_check_markup($item['value'],
+                $item['format'], $entity->language);
+            }

We don't make such line breaks. See Drupal code style.

+++ b/linkchecker.module
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+          break;
+        case 'text_long':

Blank line required - see Drupal code style

+++ b/linkchecker.module
@@ -1337,9 +1351,12 @@ function _linkchecker_link_replace(&$text, $old_link_fqdn = '', $new_link_fqdn =
+	if (isset($text)) {
+	  // If the format for the filter is not set, use the fallback.
+	  if ($format) {
+  	  $format = filter_fallback_format();

Tabs need to be spaces

+++ b/linkchecker.module
@@ -1365,10 +1382,13 @@ function _linkchecker_check_markup($text, $format = FILTER_FORMAT_DEFAULT, $lang
+        //run only enabled filters, not all

@@ -1376,10 +1396,13 @@ function _linkchecker_check_markup($text, $format = FILTER_FORMAT_DEFAULT, $lang
+        //run only enabled filters, not all

Upper-case first, period

Powered by Dreditor.

hass’s picture

+++ b/linkchecker.moduleundefined
@@ -568,44 +568,10 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
   $text_items[] = _filter_url($node->title, $filter);
...
-  // Search for links in 'weblink' nodes from 'links' module package.
-  if (module_exists('links_weblink') && $node->type == 'weblink' && !empty($node->links_weblink_url)) {
-    $text_items[] = _filter_url(url($node->links_weblink_url, $url_options), $node->format);
-  }
-
-  // Search for links in 'weblinks' nodes from 'weblinks' module.
-  if (module_exists('weblinks') && $node->type == 'weblinks' && !empty($node->url)) {
-    $text_items[] = _filter_url(url($node->url, $url_options), $node->format);
-  }
-
-  // Search for CCK-fields of types 'link' and 'text'.
-  if (module_exists('content')) {
-    // FIXME: How to port to D7???
-    $fields = content_fields(NULL, $node->type);
-    foreach ($fields as $field) {
-      if (!empty($node->{$field['field_name']})) {
-        if (module_exists('link') && $field['type'] == 'link') {
-          foreach ($node->$field['field_name'] as $delta => $item) {
-            if (!empty($item['url'])) {
-              // Make non-absolute urls absolute or they are not found by _filter_url().
-              // FIXME D7: needs language parameter for check_markup.
-              $text_items[] = _filter_url(url($item['url'], $url_options), $node->format);
-            }
-          }
-        }
-        elseif (module_exists('text') && $field['type'] == 'text') {
-          foreach ($node->$field['field_name'] as $delta => $item) {
-            $text_items[] = _filter_url($item['value'], $filter);
-          }

What about the other CCK field types that partly needs special handling? They seems to be missing in the patch!?

+++ b/linkchecker.moduleundefined
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+    // The followning line will always cause a warning, because only

"followning" -> typo

+++ b/linkchecker.moduleundefined
@@ -676,6 +642,54 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+    // The followning line will always cause a warning, because only
+    // of the two options will be set, so the @.

"because only of the two options will be set" may missing a "one".

Powered by Dreditor.

yukare’s picture

Please don't attach patches with numbers on the end of they will be ignored... :-( Buggy testing infrastructure...

I did not know about this.

Why is there an @? We write E_ALL save code and this may hide possible bugs. It must be avoided under all possible circumstances.

We will always have $field['bundles']['node']) or $field['bundles']['comment'] or none of then, but never both of then. I will test before if they are set and if not, i will create it as an empty array, so i can remove the @.

What about the other CCK field types that partly needs special handling? They seems to be missing in the patch!?

When i did the big patch, they do not have a drupal 7 version, but now weblinks and link have, i will install it and fix the patch to add support for it. The links module do not have a drupal 7 version(stable or dev).

Tabs need to be spaces

I did not used tabs, maybe my editor inserted it. I will change the option to convert tabs to spaces and see if there is a option to show tabs, so i will not do this again.

Thanks by your review and i will fix all this before posting a new patch.

hass’s picture

The most important field module to support should be http://drupal.org/project/link. There is a release. The rest can be done later if currently not available.

yukare’s picture

StatusFileSize
new7.6 KB

This patch:
* I set my editor to use 10 spaces for a tab, so i can find easy if there is one, and i removed all.
* Set the editor to remove trailing spaces, so there is not anymore on patch.
* Removed the @ as requested.
* Add suport for the link module, thanks to new fields in core, this is a simple task.
* Found a bug in running filter(and fixed it).

hass’s picture

This one looks good from code style.

+++ b/linkchecker.moduleundefined
@@ -568,43 +568,9 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+  linkchecker_parse_fields($node, $node->type, $text_items);

For the reason of references it may be better to use return in the function... only for PHP 5.3

+++ b/linkchecker.moduleundefined
@@ -676,6 +642,62 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+function linkchecker_parse_fields($entity, $type, &$text_items) {

Is this really PHP5.3 compatible? As I know we should not use & any longer.

+++ b/linkchecker.moduleundefined
@@ -1337,9 +1359,12 @@ function _linkchecker_link_replace(&$text, $old_link_fqdn = '', $new_link_fqdn =
-function _linkchecker_check_markup($text, $format = FILTER_FORMAT_DEFAULT, $langcode = '', $cache = TRUE) {
+function _linkchecker_check_markup($text, $format = NULL, $langcode = '', $cache = TRUE) {
   if (isset($text)) {

I have just checked the D7 version of this core function and it looks different. We need to clone the D7 function first and than add the blacklist stuff and other $cache_id name back. The function was also a 100% clone in D6, except a very few changes. I do not like to maintain other logic or variable names, so let's try to clone and follow core as close as possible. At least as this function is highly critical for security.

Powered by Dreditor.

kehan’s picture

subscribing

yukare’s picture

StatusFileSize
new10.13 KB

Next Patch:
* Same features as the previous.
* The reference(&) issue: on php 5.3 it can create a warning E_DEPRECATED but i think we can use it, because the drupal core use it in so many places.
* _linkchecker_check_markup now is a clone from the version in drupal 7 as requested, I just change it to have the blacklist of filters working and the cache_id to have "linkchecker:".

hass’s picture

+++ b/linkchecker.moduleundefined
@@ -676,6 +642,62 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+function linkchecker_parse_fields($entity, $type, &$text_items) {
+  $field_list = field_info_fields();
+
+  foreach($field_list as $name => $field) {
+    // Prevents a warning in the next if.
+    $field['bundles']['node'] = isset($field['bundles']['node']) ? $field['bundles']['node'] : array();
+    $field['bundles']['comment'] = isset($field['bundles']['comment']) ? $field['bundles']['comment'] : array();
+    if (in_array($type, $field['bundles']['node']) || in_array($type, $field['bundles']['comment'])) {
+      // This is because a php parse error.

Something I do not really understand here is: You push $type to the function, but you are not using it for $field['bundles']['node'] as $field['bundles'][$type]. I guess this could simplilfy the code and make it much more re-usable with blocks or other types with fields will be used in future!? Otherwise I will commit this now to get things forward, but could you explain this, please?

hass’s picture

7.x-1.x branch has been created. All patches should be made on top of this branch from now.

hass’s picture

Version: 7.x-1.x-dev » 6.x-2.x-dev

Removed comment as this unmatched patch segments seems to be a EGit bug

hass’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
hass’s picture

Have you tested $text_items[] = _linkchecker_check_markup($item['url'], NULL, $entity->language); with link module field? I'm not sure what filters will be applied to this field. I have no clue if filter_fallback_format() contains an URL filter - I guess not. This seems not reliable and broken!

hass’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
StatusFileSize
new10.21 KB

This is the patch that has been committed. I have changed the code, please review.

yukare’s picture

filter_fallback_format() return the "safe" filter to use, in the case the default is plain_text, which adds <a> to urls and is parsed by linkchecker, i have tested it. But i see now, if the fallback changes to anything that do not run url filter in it, we won´t parse the urls from link module.

hass’s picture

Plain text means "show as is" and no urls?!?

hass’s picture

Changed linkchecker_parse_fields() params and fixed broken code I have broken. Are you familiar with Entity API? I'm asking me if this may replace linkchecker_parse_fields() or can help us in other places a lot!?

yukare’s picture

Some of the changes you done break one thing, it will not work with comments, the line that was:
$text_items = $text_items + linkchecker_parse_fields($node, $node->type);
$node is a object, but $node->type is not always 'node', it is a drupal content type for bundle, may be the name of content type like page, book, mycreatedcontenttype, comment_page, block.
We can not test by $entity->type because this only works for nodes, when we use comments it is $comment->node_type, this is the reason i pass it to function, and it may be another for any possible entity type like node, comment, block, user.
Can i revert this change on next patch please?

hass’s picture

That's very bad, but see latest code and commits, please. I fixed many bugs from #15 patch. We need to extend the function params than. Type is not "node". I found this while testing... It's "page" and other stuff. That's also why I'm asking for enity api.

hass’s picture

Comments feature has been upgraded. Now we need to upgrade blocks and status code handling.

Huwiler’s picture

Category: task » bug

I looked over the linkchecker source and I think it checks only the title texts, so I tried to fix it. Because I'm new in working with Drupal, I don't know if my changes improve the code, so what do you think about this two changes:

File: Linkchecker.module
function linkchecker_parse_fields($entity, $type):
from: if (!empty($field['bundles'][$type]) && in_array($type, $field['bundles'][$type])) {
to: if (!empty($field['bundles']['node']) && in_array($type, $field['bundles']['node'])) {

and in the function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALSE):
from: $text_items = $text_items + linkchecker_parse_fields($node, $node->type);
to: $text_items = array_merge($text_items ,linkchecker_parse_fields($node, $node->type));

hass’s picture

Category: bug » task

This seems all wrong.

jeffwidman’s picture

subscribe

FlavioHuesler’s picture

I think the problem is that we do not know the entity type.

see: http://drupal.org/node/1042822

FlavioHuesler’s picture

What about the workaround by using hook_entity_load?

function mymodule_entity_load($entities, $type) {
  foreach ($entities as $entity) {
    $entity->entity_type = $type;
  }
}

and then using the following if statement:

function linkchecker_parse_fields($entity, $type) {
  $text_items = array();
  $field_list = field_info_fields();

  foreach ($field_list as $name => $field) {
    if (!empty($field['bundles'][$entity->entity_type]) && in_array($type, $field['bundles'][$entity->entity_type])) {
...
hass’s picture

Maybe... When is this hook called? We need a solution that works in batches, too.

FlavioHuesler’s picture

Saw it in the entity.inc

protected function attachLoad(&$queried_entities, $revision_id = FALSE) {
...
// Call hook_entity_load().
foreach (module_implements('entity_load') as $module) {
  $function = $module . '_entity_load';
  $function($queried_entities, $this->entityType);
}

called by DrupalEntityControllerInterface::load()

rodrigoaguilera’s picture

suscribe

Starminder’s picture

subscribe

hass’s picture

@Flavio: feel free to share a well working patch

yukare’s picture

StatusFileSize
new18.52 KB

Please give a look at this patch and tell me what you think about it, but please, this is not a patch ready to commit, it still need more work.
It fixes the tests, but i need to add a test to replace a link when there is a 301 redirect.
It change the link when there is a 301 redirect. About the big change in 200 code handle: when there is a 301 redirect, and the new url exist, drupal gives a 200 code(not a real 301 as drupal 6) AND a redirect_status of 301, so a 200 code can be a real one, or one after a 301.

For anyone that want to help: please use this patch and test it on a copy of your site, and report anything that do not work in the same way as drupal 6, so i can see what is missing.

bowersox’s picture

sub

erikwebb’s picture

subscribe

jrbeeman’s picture

subscribe

skwashd’s picture

subscribe

SanFeda’s picture

subscribe

bryancasler’s picture

subscribe

skwashd’s picture

Updated title so it makes sense in lists with other (unrelated) issues.

hass’s picture

Bullshit removal

mike503’s picture

subscribe

rt_davies’s picture

subscribe

rvilar’s picture

Subscribe

skwashd’s picture

Title: D7 porting to do's » [Link checker] Drupal 7 porting to do's

Fixing title so it isn't "bullshit" in dashboards and tracker lists.

hass’s picture

Title: [Link checker] Drupal 7 porting to do's » D7 porting to do's
FlavioHuesler’s picture

StatusFileSize
new3.43 KB

It would be nice if you could have a look at this small patch.

Changes:
- added the entity_type property: The hook_entity_load (as it says) is called when an entity is loading. Therefore hook_node_insert has to invoke it manually.
- two errors fixed with if(empty(...))
- added $link = new stdClass(); prior to accessing uninitialized objects

yukare’s picture

For me, this $entity->entity_type is a missing thing from drupal api, it is so much usefull that i do not know why it is not in drupal.

FlavioHuesler’s picture

Well it looks like they work on it:
http://drupal.org/node/1042822#comment-4574054

bluegray’s picture

subscribe

hass’s picture

@yukare: have you reviewed the patch in #74? Are you still working on the other stuff?

yukare’s picture

StatusFileSize
new20.56 KB

#78
I have got some code from the patch at #74 and joined with my patch at #60, please review and commit my patch here and give credits to FlavioHuesler too.

#74
"added the entity_type property: The hook_entity_load (as it says) is called when an entity is loading. Therefore hook_node_insert has to invoke it manually."

This is usefull only if we make all changes to not pass $entity_type in any function, but other changes are here, those changes that fixes warnings are always welcome.

hass’s picture

There are again some issues with tabs. Additional to this I'm wondering why we need $link = new stdClass(); db_query() should always save an object into the variable, if I'm not completly wrong. Therefore the object should already been initialized, but is empty - or is this wrong!? Latest patch is missing blocks logic to complete all stuff if I remember correctly.

yukare’s picture

I will look about tabs, sorry.
When a query do not have results, the result is an empty variable, not a object, creating an empty object before trying to use it remove a warning about this.

if (!$link) {
//$link here is the same as a null variable
$link = new stdClass();
// without the prevoius line, there is a error about trying to create a object from a empty variable.
$link->urlhash = $urlhash;

FlavioHuesler’s picture

#81
exactly

#79
True. $entity_type should then be used everywhere but I thought the idea of using $entity_type is better than using the entity type as text in the parameter.

I think you forgot this. I had once an error/warning because $languages[$node->language] was empty not $node->language:
- $url_options = empty($node->language) ? array('absolute' => TRUE) : array('language' => $languages[$node->language], 'absolute' => TRUE);
+ $url_options = (empty($node->language) || empty($languages[$node->language])) ? array('absolute' => TRUE) : array('language' => $languages[$node->language], 'absolute' => TRUE);

polynya’s picture

Category: task » feature
StatusFileSize
new1.26 KB

This is a patch to fix a couple of bugs in linkchecker_link_edit_form which stop the "Edit link settings" from working.

polynya’s picture

Category: feature » task
StatusFileSize
new13.79 KB

This patch is a port to Drupal 7 of nkschaefer's Views integration.

polynya’s picture

StatusFileSize
new4.49 KB

This patch fixes a few problems in the LinkCheckerLinkExtractionTest.

yukare’s picture

#83
Thanks by this.

#84
A really big thanks by this one, and a very good thing to have :)

#85
Parts of it are already on #79, or you patch is over the patch at #79?

polynya’s picture

I hadn't tried patch #79, it looks like it covers everything in patch #85.

hass’s picture

I've committed #79 now with a few code style issues. Some listed here. Let's follow up with the other patches now. The VIEWS patch in #84 need to get posted to #965720: Views integration for Broken Links report. This one have nothing to do with the D7 update.

+++ b/linkchecker.moduleundefined
+++ b/linkchecker.moduleundefined
@@ -194,7 +194,6 @@ function _linkchecker_status_handling($link, $response) {

@@ -418,6 +382,50 @@ function _linkchecker_status_handling($link, $response) {
+function _linkchecker_replace_fields($entity, $entityType, $bundle, $oldURL, $newURL) {
...
+      // this is because a php error
+      $entityField =& $entity->$name;

Do we also need the "continue" here?

+++ b/linkchecker.moduleundefined
@@ -208,132 +207,97 @@ function _linkchecker_status_handling($link, $response) {
+          debug('aqui3');

Has been removed.

+++ b/linkchecker.moduleundefined
@@ -622,7 +630,8 @@ function _linkchecker_add_node_links($node, $skip_missing_links_detection = FALS
+      	$link = new stdClass();
+      	$link->urlhash = $urlhash;

I fixed these indentation bugs.

hass’s picture

+++ b/linkchecker.moduleundefined
@@ -418,6 +382,50 @@ function _linkchecker_status_handling($link, $response) {
+    if (@in_array($entityType, $field['bundles'][$bundle])) {

We need to remove the @ here or we may not see errors when they happen. It's a rule in Drupal not to suppress errors. Catch them and handle the error or make it more save.

hass’s picture

Committed #83, THX!

hass’s picture

+++ b/linkchecker.testundefined
@@ -284,7 +271,7 @@ EOT;
-    $this->verbose(theme_item_list($urls_relative, 'Verify if following relative URLs exists:'));
+    $this->verbose(theme_item_list(array('items' => $urls_relative, 'title' => 'Verify if following relative URLs exists:' , 'attributes' => array(), 'type' => 'ul')));

These change from #85 has been committed. All other changes failed to apply.

hass’s picture

+++ b/linkchecker.moduleundefined
@@ -594,7 +604,10 @@ function linkchecker_block_custom_delete_form_submit($form, &$form_state) {
-  $url_options = empty($node->language) ? array('absolute' => TRUE) : array('language' => $languages[$node->language], 'absolute' => TRUE);
+  $url_options =
+  	(empty($node->language) || empty($languages[$node->language]))
+  	? array('absolute' => TRUE)
+  	: array('language' => $languages[$node->language], 'absolute' => TRUE);

How about this fix from #74?

yukare’s picture

#88
Question about continue: this come from #74, if $entity->$name is empty, the rest do not need to run and may cause warnings, but i do not remember that i had warnings about this.
Thanks by removing that debug() and fixing the identation bugs.

#89
I did it wrong, we must not have @ anywere as you said, i will do a patch to fix this.

#91
The patch at #85 was done without the patch at #79, so most of the fixes at #79 about tests are too on #85.

#92
I did not had a issue without it, may FlavioHuesler can explain why it is necessary.

Please, can you consider making at last a dev release of it? For me it need work only on small things, so we need more people to use and test it before you make a stable release.

hass’s picture

There is a dev... But not on project home.

hass’s picture

After reading all comments again this seems to be the remaining TODO list:

* #42, links module links not filtered correctly into URL
* #49, blocks upgrade missing
* #54, #74, 76# Review entity stuff
* #82, $languages[$node->language], $language can be empty!???
* _linkchecker_link_replace(): I'd like to get rid of all the error prone regexes and move to HTML dom. We can get this from http://drupal.org/project/gotwo where I've already done it.
* camel case variables introduced by patches in this case need to be wiped out.
* _linkchecker_unpublish_nodes() review how we can get benefit from node_load_multiple(). (low prio)

hass’s picture

@yukare: Can you explain how it is possible that a block do not have "info" or "title" field defined?

  // Remove a warning if it is not defined.
  if (isset($block_custom->info)) {
    $text_items[] = _filter_url($block_custom->info, $filter);
  }
  if (isset($block_custom->title)) {
    $text_items[] = _filter_url($block_custom->title, $filter);
  }

Äh - nodes??? We hook into block save, not nodes. And how is it possible that a block do not have a format?

  // The $block_custom variable can come from a scan on nodes and blocks or from editing a block
  // and the object is not the same from both sources.
  if (isset($block_custom->format)) {
    $text_items[] = _linkchecker_check_markup($block_custom->body, $block_custom->format);
  }
  else {
    $text_items[] = _linkchecker_check_markup($block_custom->body['value'], $block_custom->body['format']);
  }

Here is an updated blocks code with caching enabled, but I still do not understand the comments.

  // Prevent from warnings if fields are not defined.
  if (!empty($block_custom->info)) {
    $text_items[] = _filter_url($block_custom->info, $filter);
  }
  if (!empty($block_custom->title)) {
    $text_items[] = _filter_url($block_custom->title, $filter);
  }
  // The variable $block_custom can come from a scan for blocks or from editing
  // a block and the object is not the same from both sources.
  if (isset($block_custom->format)) {
    $text_items[] = _linkchecker_check_markup($block_custom->body, $block_custom->format, '', TRUE);
  }
  else {
    $text_items[] = _linkchecker_check_markup($block_custom->body['value'], $block_custom->body['format'], '', TRUE);
  }
yukare’s picture

#95
I will look at this list this week and work on it.

#96
When you scan and when you edit a block, we have two different objects! For me this is a drupal bug, and is very strange, but is real. We can have $block_custom->info or $block_custom->title for the same title depending from where the block comes.
Because of this we have those ifs, to see if we have one case or another.

polynya’s picture

StatusFileSize
new1.7 KB

Absolute URIs are specified for redirects by RFC2616 section 14.30, but many web servers use relative URIs. In these cases the response from drupal_http_request is error code -1002 and error message "Missing schema". This is not very informative and doesn't tell the user if the link is really broken.

This patch tries to build the absolute URI using the scheme, host and port from $link->url and the path set to $response->redirect_url. The new URI is used to get the response from drupal_http_request.

polynya’s picture

StatusFileSize
new1.69 KB

If the response from drupal_http_request is code 200 and the redirect code is not 301 then the link is not updated in the linkchecker_link table and the code is left as -1.

This patch processes these links. They are successful if the redirect code is in the list "Don't treat these response codes as errors" and is in the range 300-399. Links with other redirect codes are failed.

polynya’s picture

Sorry, the file names for patches #98 and #99 have the wrong issue number. Here are the correct files.

hass’s picture

Don't post unrelated issues to this case here, please. Ignored responds codes are not logged.

polynya’s picture

OK, I have created new issues for #98 and #99.

It's not clear to me what should be included in this issue. What about problems with blocks caused by changes in D7?

hass’s picture

If you would read the issue or #95 you would know that blocks code has not yet committed.

geek-merlin’s picture

memo to me: subscribe and look for some time to help porting this great module!

geek-merlin’s picture

Title: D7 porting to do's » D7 porting to do's for Link checker
Gerben Zaagsma’s picture

subscribing

stefanx’s picture

Assigned: Unassigned » stefanx

I would love to see link checker for Drupal 7.

stefanx’s picture

Assigned: stefanx » Unassigned

Correcting assignment.

paulgemini’s picture

subbing!

mike503’s picture

+1 subscribe

BigBrother2010’s picture

+1

astutonet’s picture

+1 subs

nkschaefer’s picture

What's the status of this? It looks like the last time anyone made any changes was over the summer.

Since I don't trust myself to find and apply all the right patches, could someone please upload a zip or tar of the Drupal 7 version we have of the module so far? Or consolidate all the right/needed patches into one patch? Or even create a sandbox project for the Drupal 7 port of the module? This module is critical for my Drupal 7 site, and I'd be happy to work on this, test it out, and provide patches to get a working D7 version of Link Checker out. Just please point me in the right direction.

Thanks,
Nathan

mgifford’s picture

@nkschaefer It's useful to run it through http://drupal.org/project/coder to pull out the obvious stuff and create a patch.

Once there's a patch to review and errors reported it's easier to find the remaining places required to make the upgrade.

Please give it a try and post questions if you run into difficult. Also, work from the git HEAD repository for most projects.

hass’s picture

There is a dev, you don't need to patch anything yourself

mgifford’s picture

@hass - damn.. or rather excellent! However it isn't listed here yet:
http://drupal.org/project/linkchecker

So probably in git but no release made from what I can tell.

Can someone make an official dev release for this project and then we can mark this issue closed & start tackling the bugs with the dev release individually?

hass’s picture

There IS a dev. Click all releases.

hass’s picture

* #82 FIXED.
* #98 core issue. Case has already moved over to core.
* #99 FIXED.
* #100 FIXED.
* Camel case variables introduced by patches in this case need to be wiped out. FIXED
* #49, blocks upgrade. FIXED
* #42, links module links not filtered correctly into URL. FIXED

Optional:
* #54, #74, 76# Review entity stuff
* _linkchecker_link_replace(): I'd like to get rid of all the error prone regexes and move to HTML dom. We can get this from http://drupal.org/project/gotwo where I've already done it.
* _linkchecker_unpublish_nodes() review how we can get benefit from node_load_multiple(). (low prio)

hass’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

hass’s picture

Status: Fixed » Closed (fixed)

This is a call for testers for final D7 DEV.

In last 6 weeks the patches in this case have proven to be unreliable and unfinished upgrades with many D6 specific leftovers. I've got some helpful feedback from a few guys about some easy to fix issues, but if there is more to fix we should know asap.

I believe it's now nearly bug-free and very stable code, but better safe than sorry.

hass’s picture

Status: Closed (fixed) » Fixed

Setting fixed so it shows up in queue again.

Automatically closed -- issue fixed for 2 weeks with no activity.