Download & Extend

Use real HTML-Diff algorithm in Drupal Diff module

Project:Diff
Version:6.x-2.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:yhahn
Status:needs work

Issue Summary

Suggestion

Daisy Diff can make a real diff of html-code instead of just striping out html-tags. In GSoC 2008 the author wrote a php version for Wikipedia: http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/diff/

Maybe it could also be used for the Drupal Diff module?

Daisy Diff Description

Daisy Diff is a Java library that diffs (compares) HTML files. It highlights added and removed words and annotates changes to the styling.

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.

Demo of HTML-Diff: http://code.google.com/p/daisydiff/wiki/Examples

Comments

#1

I had a detailed look at daisydiff and started integrating it into the diff module. The problem is that the PHP implementation of daisydiff is very slow when documens are longer (>4 screen pages). I talked to Guy, the developer of daisydiff and he confirmed this. The comparison does not deliver good results for longer documents. Therefore, I can not recommend using daisydiff for the diff module.

#2

Assigned to:Anonymous» yhahn
Status:active» postponed

Thanks for this evaluation - please feel free to make active again if/when the performance of this library is improved.

#3

#4

Daisydiff's output does look nice, however including this might be a non-starter due to license issues.

Given that Drupal is released under GPL 2 and Daisydiff Apache 2

From Various Licenses and Comments about Them - GNU Project
http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses

Apache License, Version 2.0
This is a free software license, compatible with version 3 of the GPL.
Please note that this license is not compatible with GPL version 2

#5

like many other drupal modules it can be used as an optional third party. It does not have to be bounded with this module. it might also be possible to use the java version as well

#6

subscribing

#7

I've been playing around with a couple diff packages. I'm hoping to have a test patch of one of them up and running soon. Hopefully, I'll post back within a few days.

#8

subscribe

#10

Status:postponed» needs work

I've created a modified version that uses PEAR's Text_Diff package, which needs to already be installed on your system beforehand.

It's still quite kludgy at the moment, but it appears to work decently enough (but only for nodes in the current version).

I've attached a screenshot along with the patch files.

Also, it might be worthwhile to combine this with http://drupal.org/node/372957 to create a simple configuration screen to toggle between the diff engines and whether or not to include markup.

AttachmentSize
pear-diff.png 7.78 KB
diff.css_.patch 281 bytes
diff.module.patch 1.32 KB
diff.pages_.inc_.patch 539 bytes
node.inc_.patch 566 bytes

#11

If you're interested you might also want to look at:
http://drupal.org/project/lifewire_diff

It's only available for 5, but it also uses PEAR Text_diff and I used it as a jumping off point for my experiment. It also allows users to select between the 2 column and single column diff views.

nobody click here