Problem/Motivation
Provide the option to strip HTML tags from the diff.
Remaining tasks
- Make the link display either 'Hide markup' or 'Show markup' (http://drupal.org/node/372957#comment-5834472)
Original report by erykmynn
In looking for a way to make DIFF and a WYSIWYG editor play nice, I came across this thread to which I was unsure what had ever happened... http://drupal.org/node/125494
I found several errors in the patch code for modifying DIFF to strip HTML and work with WYSIWYG editors. I have created a fixed version of /diff/node.inc that works on my 5.x system to make WYSIWYG and DIFF play nice.
This is a critical component for us and I hope it helps someone. I'm not competent with patches so I'll upload the whole file.
Hope this helps someone, if not please help direct me into why I wasn't able to get it to work without modifying the code (someone said the changes we're rolled in, but I don't see this to be the case).
Sorry if I seem a little lost, this is my first attempt at contributing.
Comment | File | Size | Author |
---|---|---|---|
#100 | diff-hide-show-markup-372957-100.patch | 801 bytes | andrew_rs |
#93 | diff-hide-show-markup-372957-93.patch | 7.7 KB | c31ck |
#78 | diff-html-strip-dev-1-2-12b.patch | 7.27 KB | jzornig |
#76 | diff-html-strip-dev-1-2-12.patch | 7.5 KB | jzornig |
#73 | 372957-diff-strip-html-73.patch | 6.49 KB | ezra-g |
Comments
Comment #1
JuliaKM CreditAttribution: JuliaKM commentedThanks for posting this. I took your code and made a few changes for Drupal 6. I'm attaching a patch for Diff 6.x-2.0.
Comment #2
JuliaKM CreditAttribution: JuliaKM commentedSorry, my patch didn't seem to attach. Here we go again.
Comment #3
JuliaKM CreditAttribution: JuliaKM commentedComment #4
JuliaKM CreditAttribution: JuliaKM commentedComment #5
lsrzj CreditAttribution: lsrzj commentedI liked that patch, it's working well as I could notice until now. If I find some problem, I'll tell.
Comment #6
brainski CreditAttribution: brainski commentedI'm using the patch too and it's working well.
One point someone could improve:
All entities like "Umlaut" ü %ouml; are note replaced with ü ä and so one.
If someone has an easy idea how to replace them, please post it here.
Comment #7
philbar CreditAttribution: philbar commentedsubscribe
Comment #8
lsrzj CreditAttribution: lsrzj commentedI made a generic patch using html_entity_decode() and it seems to be working ok. It was not tested deeply enough to make sure it is. That way all possible HTML entities will be decoded including the one suggested. I tested with portuguese accentuation marks because is my mother language and it decoded everything right. Using this PHP function requires PHP 4 >= 4.3.0 and above. An observation, patch the original node.inc without any other patch otherwise this patch will fail to be applied.
Comment #9
JuliaKM CreditAttribution: JuliaKM commentedThe patch applied cleanly. However, I found that when I look at multiple revisions to a node, the diff only strips HTML for the older ones.
To reproduce:
1. Make multiple revisions to a node (with revision moderation and FCKeditor turned on)
2. View revisions showing a diff between the current version and the oldest version
Comment #10
lsrzj CreditAttribution: lsrzj commentedI don't use revision moderation module that's why I didn't see that. I don't know if I'll fix it for use with revision moderation. I don't understand Drupal in it's inner parts, don't know anything about module integration and interaction. I simply saw your code, understood how you were doing the substitution and changed it.
Comment #11
lsrzj CreditAttribution: lsrzj commentedI did the patch to decode the HTML entities not to strip HTML tags. The entities it's decoding are those like   ã and all of the others.
Comment #12
patchak CreditAttribution: patchak commentedI applied this patch, adn nothing changed on my site, all the HTML is still there...I also emptied cache, so I think this patch does not work under all circumstances....
I use diff latest dev version..
This is what I get in the terminal window :
imac-de-alexis-dufresne:temp alexis$ patch < Diff_HTMLtags_juliakm_021909_2.patch
patching file node.inc
Seem like patch is applied, but is there any way to actually make sure I'm using a patched file?
thanks,
Patchak
Comment #13
lsrzj CreditAttribution: lsrzj commentedUse patch that is on post #8 instead, I know it works because it's the one I use on my site. But please patch the original node.inc that comes with the module without any other patch.
Comment #14
ggevalt CreditAttribution: ggevalt commentedI used the patch in post #8 and it worked splendidly. Thanks to the person who did it and we hope it becomes part of the module. We did not test on previously created nodes because we are using this module and patch with all new installations.
Thanks again.
geoff
Comment #15
SeanBannister CreditAttribution: SeanBannister commentedsub
Comment #16
erykmynn CreditAttribution: erykmynn commentedPerhaps this needs to be an option, or tied to whether or not a WYSIWYG editor is in place? I can imagine stripping tags from DIFF would be a real headache for anyone writing page body code by hand.
Comment #17
erykmynn CreditAttribution: erykmynn commentedSo is anyone still following this? I'm wondering what the next steps are.
Probably either to:
Add an option for stripping tags in Diff
or
Make it do so gracefully, not showing extraneous tags, but somehow showing markup that DID change....
Comment #18
philbar CreditAttribution: philbar commentedWhy can't the HTML be rendered in the Diff and CSS be used to highlight differences? I feel this would be the most intuitive way to handle WYSIWYG.
For instance, if someone changes the heading size from
<h2>
to<h6>
a div can be placed around the change and used to highlight the difference.Like this:
The code above would be rendered like normal html and the .diff-delete and the .diff-add will be highlighted red and green using CSS.
Additionally, a button could be added to switch between this new, rendered diff and the classic, code diff.
Comment #19
erykmynn CreditAttribution: erykmynn commentedThis is a perfectly reasonable option, and probably preferable to either no html or plain html for a lot of people (but not everyone). The use case that originally brought me to this is end users communicating to each other collaborating via posts with a WYSIWYG editor. For these people, even seeing the HTML makes them think they broke something.
So I can envision a couple different options that would cover the bases of almost everyone....
1) Classic Code Diff
2) Highlighted (or lowlighted) Tag Diff
3) Tags only show when changed Diff, highlighted
4) No tags in Diff, text only
And really one of these might be redundant.
Comment #20
SeanBannister CreditAttribution: SeanBannister commentedJust tested the patch and it works great.
I think this should be the default option for Diff, but once your viewing the Diff there could be a button "Show HTML" which could toggle the HTML on and off by using jQuery. This would mean each html tag would need to be wrapped in a div with a class that jQuery could target.
Comment #21
ggevalt CreditAttribution: ggevalt commentedOne side issue, by the way, which we've worked around.... The Diff module in either form -- stripped of html or not -- renders the font color and style functions in FCKeditor inoperable. Esssentially the Diff module CSS needs overrule the FCKeditor settings with respect to color of font and background font colors.....
geoff
Comment #22
lsrzj CreditAttribution: lsrzj commentedI was the author of the patch on #8. I used the original patch posted here but was not decoding the entities and I simply added the decode instruction. But I really stopped following here now I'm back. I think it could be an option if there is somebody documenting real HTML code using Drupal Diff module to make version control of it.
Comment #23
BenK CreditAttribution: BenK commentedSubscribing...
Comment #24
MarcusF CreditAttribution: MarcusF commentedsubscribe
Comment #25
brianmercer CreditAttribution: brianmercer commentedsubscribed
Comment #26
drpitch CreditAttribution: drpitch commented#8: Cool patch. Thanks ;)
Comment #27
Azol CreditAttribution: Azol commentedPatch works great, hoping for it to become available as a Diff option/setting.
Subscribing.
Comment #28
kcyarn CreditAttribution: kcyarn commentedThanks for posting this. It came very close to what I needed. I edited the replacement values to
This covered everything my users commonly use and removed the double spaces between elements.
Comment #29
skylord CreditAttribution: skylord commentedSubscribing
Comment #30
s4j4n CreditAttribution: s4j4n commentedsubscribing
Comment #31
Glottus CreditAttribution: Glottus commentedsubscribing
Comment #32
fanzila CreditAttribution: fanzila commentedsusbscribe
Comment #33
jzornig CreditAttribution: jzornig commentedsubscribe
Comment #34
Azol CreditAttribution: Azol commentedThe patch didn't make it to the new version (2.1). I manually applied the patch to node.inc and it worked well.
Comment #35
andrew_rs CreditAttribution: andrew_rs commentedI've been messing around with this for a couple days,and have created a version with a checkbox to choose whether or not to remove markup for the diff. I haven't tested these patches to thoroughly, so please treat this as very experimental at the moment.
FYI, I used the latest (2010-Sep-10) dev version as my starting point.
Comment #36
andrew_rs CreditAttribution: andrew_rs commentedHere's another patch of node.inc that uses drupal_html_to_text for stripping HTML.
Comment #37
steinmb CreditAttribution: steinmb commentedsubscribing
Comment #38
andrew_rs CreditAttribution: andrew_rs commentedFrom #18
This would be cool, but I think it would involve modifying the presentation to show the full body of both nodes (tables, nested lists, etc). I could see this making the diff view necessitate both vertical and horizontal scrolling in many cases. Regardless, this might be an option to pursue.
Comment #39
hlan CreditAttribution: hlan commentedHi Andrew_rs,
I used your patches and applied on the latest Diff dev module. And I tried it with CKEditor to see whether WYSIWYG is working on Diff. It can give all differences between revisions but there are HTML tags showing in the new changes. Is it supposed to happen like that? Is there any possible way to remove those tags in comparison to make it more user friendly?
Thanks
Hlan
Comment #40
andrew_rs CreditAttribution: andrew_rs commentedHere is a new batch of patches:
Changes from the last round include:
- To get the "view changes" button on edit node pages to work again
- The current patch implements this under diff_diffs_show in pages.inc. It might make more sense to move this to _diff_body_rows
Comment #41
andrew_rs CreditAttribution: andrew_rs commentedhlan,
Please try the patches that I posted in #40
And please post back if you run into any problems or have any suggestions.
Comment #42
hlan CreditAttribution: hlan commentedHi Andrew,
Thanks alot for new patches. Markup can be hidden without any problem but there's a problem with puntuation mark-apostrophe is showing in HTML format "&rsquo". Apart from that I couldn't find any error at all.
Thanks
Hlan
Comment #43
philbar CreditAttribution: philbar commentedThis is Google Docs approach:
http://googledocs.blogspot.com/2010/09/more-tools-for-viewing-document.html
I'm in favor of a CSS technique rather than a side-by-side comparison like we use to compare code.
We should just add
<span class="diff-remove">
or<span class="diff-add">
to changes in the HTML then add some CSS to display the changes fully rendered:Comment #44
ggevalt CreditAttribution: ggevalt commentedNice link, philbar. Yes, I would agree that the googledocs approach is nice and it would be great to see in the diff module. As a longtime professional journalist, the versioning system is a great tool for writing and editing. I'm now working in the education field and teachers -- and students -- love the versioning of the diff module, but they sometimes are confused by the side-by-side approach and figuring intuitively what was changed and how.
So I would love to see some further development of this.
And many many thanks to all who've been working on this. This is a module of great value. And it is appreciated out here in the hinterlands. We'd be most happy to test any future developments with students and kids...
g
Comment #45
yang_yi_cn CreditAttribution: yang_yi_cn commentedsubscribe
Comment #46
MurzSubscribe. Can you post the screenshots how this looks now in Drupal?
Comment #47
bradleybacon CreditAttribution: bradleybacon commentedsubscribe
Comment #48
jzornig CreditAttribution: jzornig commentedI tried the patches in #40 on the current dev version and while I get a Hide/Show Markup link I always see markup. I also get the following error.
warning: Missing argument 4 for diff_diffs_show() in /var/www/html/ppl/sites/all/modules/diff/diff.pages.inc on line 166.
Comment #49
MurzMaybe we can use markdownify library for convert html to plain text and after that do the diff for plain text?
http://milianw.de/projects/markdownify/
Or another tools we can use: http://htmldiff.codeplex.com/
http://code.google.com/p/daisydiff/
Comment #50
jzornig CreditAttribution: jzornig commentedSorry, I realised I had changed my nodes to use a cck textfield rather than the node body. The patch in #40 only works for node body.
i just wrote a function that I included in a tiny custom module that does this for all cck textfields. Hope it will be of help to others.
Comment #51
andrew_rs CreditAttribution: andrew_rs commentedjzornig,
Thanks for the contribution. I'll incorporate it into future patches.
Comment #52
andrew_rs CreditAttribution: andrew_rs commentedMurz,
Attached are a couple creen shots, the first shows the checkbox on the revisions page that allows you to show or hide markup, the second show the show/hide markup toggle link on a diff page.
Comment #53
MurzI think that will be better to do the inline diff for visually show text difference, like in this screenshot: http://globalmoxie.com/bm~pix/editor_diff-text~s600x600.png
Changed html tags or styles we can highlight via different color.
For example first revision:
second revision
Highlight must be:
Comment #54
MurzComment #57
1001uzman CreditAttribution: 1001uzman commentednode.inc last version link please?
Comment #58
tayzlor CreditAttribution: tayzlor commentedanyone know what is the latest with this patch? are the patches in #40 ready for review?
Comment #59
alexpottsubscribe
Comment #60
alexpottI've refactored the patch making some of code shorter but not changing the logic. I've also allowed sitebuilders to set a default for each node type as to whether of not to include markup in the diff. This default also applies to the "view changes" button on the node edit form (if enabled).
This patch also fixes a clash with the revisoning module. Basically if you had this patch and revisioning installed then the compare button on the revisions list would not work.
Comment #61
alexpottSmall bug in last patch meant that it was comparing old with old when comparing version with markup.
New patch attached.
Comment #62
alexpottDoh! Proper patch attached...
Comment #63
jvieille CreditAttribution: jvieille commentedsub
Comment #64
jvieille CreditAttribution: jvieille commentedExcellent!
Doing some testing, maybe some improvement would be appreciated in the way the way Diff breaks the changes.
For example, on the attached screenshots after adding the only line "Membre du groupe joint SEE-SC/AFIS, vous êtes invités à commenter / éditer ce programme"
- Diff shows 3 modifications (in red) in place of a single one
-> the first split occurs at the "/" sign when surrounded by spaces ("SC/AFIS" passed, "commenter / éditer" did not)
-> the third highlighted modification is definitely wrong
- Diff did not stop at the end of the line / sentence - it displays the remain of the table in a number of wrong additions split randomly
I think the attached screenshots are explicit enough
Bellow the full html code for this page (generated by CKEditor)
Thanks
Comment #65
carl.ben CreditAttribution: carl.ben commentedsub
Comment #66
croryx CreditAttribution: croryx commentedSubscribing.
Comment #67
lelizondo CreditAttribution: lelizondo commentedPatch in #62 works great. Thanks.
Comment #68
testerz123456 CreditAttribution: testerz123456 commentedPatch #62 worked well for me, too.
Subscribing.
Comment #69
justcaldwellSubscribing
Comment #70
SeanBannister CreditAttribution: SeanBannister commented@justcaldwell stop subscribing, start following
Comment #71
majorbenks CreditAttribution: majorbenks commentedIs this patch also working for drupal 7? Will there be a patch for d7?
Comment #72
mstef CreditAttribution: mstef commentedPatch @ 62 good.
Comment #73
ezra-g CreditAttribution: ezra-g commentedThe patch in #62 no longer applied to the dev branch with git apply. I applied with patch -p1 and re-rolled.
This works as expected in my testing and looks solid from a code perspective. Based on this and the above positive reviews, I think the broader issue is RTBC.
Leave in "Needs review" status as I've re-rolled the patch.
Comment #74
lnspham CreditAttribution: lnspham commentedComment #75
DuaelFrHi,
I just tried to review this patch but it does not apply anymore on 7.x dev branch.
Regards.
Comment #76
jzornig CreditAttribution: jzornig commentedI rerolled the patch for the 1-2-12 dev.
Comment #77
jzornig CreditAttribution: jzornig commentedPlease ignore my rerolled patch #76 it has a number of issues.
Comment #78
jzornig CreditAttribution: jzornig commentedHopefully this works a bit better.
Comment #79
Azol CreditAttribution: Azol commentedjzornig: Excuse my ignorance, but what is a "1-2-12 dev"?
#73 applies cleanly to 6.x-2.x-dev, but what about adding setting "Hide HTML markup by default"? Could be a good idea for those who want to hide it permanently without the need to click on the link all the time.
Comment #80
jzornig CreditAttribution: jzornig commentedHi Azol,
1-2-12 is the date of the 7.x dev release I built the patch against.
There is a checkbox in the diff section of the content type edit form "Remove markup by default when comparing body text". Although this should be moved into the field instance edit form as it currently applies to all fields and not just "body" as the label suggests. I'll look at making this change, but as it is if you had html in multiple fields you would most likely want to strip it from all or not.
Comment #81
Azol CreditAttribution: Azol commentedThat was a bit misleading, because version this issue reports is still 6.x. Should we change it to 7.x and then backport to 6.x?
Comment #82
jzornig CreditAttribution: jzornig commentedSetting to 7.x-1.x-dev
Comment #83
jzornig CreditAttribution: jzornig commentedsorry should be 7.x-2.x-dev
Comment #84
Azol CreditAttribution: Azol commentedJust do not want to postpone the 6.x patch (#73), this issue is - what? - 3 years old!
+1 RTBC
Comment #85
julien.reulos CreditAttribution: julien.reulos commentedPatch is working great, thank you so much for this one! +1 for 7.x patch RTBC as well.
Comment #86
jzornig CreditAttribution: jzornig commentedI'm with Azol, I've been using the patch #40 on a D6 site with additional code from #50 in a custom module since 2010. But now I'm starting to use diff on D7 sites and I needed a working patch for that immediately. It would be good to get HTML strip into diff releases for both D6 and D7.
Comment #87
Azol CreditAttribution: Azol commented#73 is the current patch for 6.x-2.x-dev
#78 is the current patch for 7.x-2.x-dev
I tested #73 only and it's working great.
Comment #88
soulfroys#73 applied in the last 6.x-2.x-dev (2011-Aug-07). Thanks so much!
Comment #89
realityloop#73 committed to 6.x-2.x-dev branch.
I'm not happy with how #78 currently displays, to reduce confusion the Hide/Show link should only display either Hide or Show based on what clicking it will do at any time.
I'd prefer that when you hide the HTML tags boths sides had tags hidden, though this could be confusion caused by the way the link is currently titled.
Comment #90
drurian CreditAttribution: drurian commentedDo the patches work for CCK fields?
Comment #91
davidhunter CreditAttribution: davidhunter commentedHi All - I applied the patch from #78 of this threa - but I get two errors.
This is my first time applying a patch to a module, so not entirely sure that I am doing it properyl.
Can someone check my output ?
It would be greatly appreciated :
[root@dev diff]# patch -p0 --dry-run < "diff-html-strip-dev-1-2-12b.patch "
bash: diff-html-strip-dev-1-2-12b.patch : No such file or directory
[root@dev diff]# patch -p0 --dry-run < "diff-html-strip-dev-1-2-12b.patch"
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -Naur a/diff/diff.module b/diff/diff.module
|--- a/diff/diff.module 2012-02-01 07:59:31.000000000 +1000
|+++ b/diff/diff.module 2012-02-28 16:29:05.000000000 +1000
--------------------------
File to patch: diff.module
patching file diff.module
Hunk #2 succeeded at 179 with fuzz 2 (offset -8 lines).
Hunk #3 succeeded at 210 (offset -8 lines).
can't find file to patch at input line 38
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -Naur a/diff/diff.pages.inc b/diff/diff.pages.inc
|--- a/diff/diff.pages.inc 2012-02-01 07:59:31.000000000 +1000
|+++ b/diff/diff.pages.inc 2012-02-28 16:39:33.000000000 +1000
--------------------------
File to patch: diff.pages.inc
patching file diff.pages.inc
Hunk #1 succeeded at 152 (offset -1 lines).
Hunk #2 succeeded at 165 (offset -1 lines).
Hunk #3 succeeded at 226 (offset -1 lines).
Hunk #4 FAILED at 265.
Hunk #5 succeeded at 273 (offset -1 lines).
Hunk #6 FAILED at 340.
2 out of 6 hunks FAILED -- saving rejects to file diff.pages.inc.rej
can't find file to patch at input line 127
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff -Naur a/diff/includes/node.inc b/diff/includes/node.inc
|--- a/diff/includes/node.inc 2012-02-01 07:59:31.000000000 +1000
|+++ b/diff/includes/node.inc 2012-02-28 15:21:36.000000000 +1000
--------------------------
File to patch: includes/node.inc
patching file includes/node.inc
Hunk #1 succeeded at 8 with fuzz 2 (offset -1 lines).
Hunk #2 succeeded at 35 (offset -1 lines).
Hunk #3 succeeded at 46 with fuzz 1 (offset -1 lines).
Hunk #4 succeeded at 54 (offset -1 lines).
Hunk #5 succeeded at 67 (offset -1 lines).
Comment #92
jzornig CreditAttribution: jzornig commentedDavid, patching depends on your relative location to the files being patched and the paths in the patch file, for this patch file if you are in sites/all/modules use patch -p1 but if you are in sites/all/modules/diff use patch -p2
Comment #93
c31ck CreditAttribution: c31ck commentedPatch based on #78 on that changes the link to either 'Show markup' or 'Hide markup' as per #89.
Comment #94
c31ck CreditAttribution: c31ck commentedSetting status to needs review.
Comment #95
davidhunter CreditAttribution: davidhunter commentedI applied the patch from Comment 93 -works perfectly!
Comment #96
Takki CreditAttribution: Takki commentedAlso applied the patch from #93 without problems, text of the link switches correct.
Comment #97
mgiffordNice.. I haven't tested this yet, but really think that this option will help non-technical people make better use of diff.
Comment #98
davidhunter CreditAttribution: davidhunter commentedVery true.
The Diff module is a huge benefit to our translation team. But they were hesitant to take advantage of the functionality since viewing the HTML seemed a bit daunting for them. A bonus is being able to toggle displaying the HTML display tags.
Comment #99
andrew_rs CreditAttribution: andrew_rs commentedI've been away from Drupal for a while (I know, shame on me), I decided to come back to this issue after a long hiatus to help me gear up for Drupalcon Munich, and am thrilled to see all of the progress that's been made.
#93 is working well for me as well
Comment #100
andrew_rs CreditAttribution: andrew_rs commentedI created a quick patch for 6.x-2.x that addresses #89 and removes the redundancy of the Hide/Show Markup link text.
Comment #101
sylus CreditAttribution: sylus commented#93 Works for me too, great job guys!
Comment #102
realityloopcommitted to 2.x
Comment #103
Alan D. CreditAttribution: Alan D. commentedThere is no update for the menu item, so does this work out of the box?
i.e. do you have to flush the menu for the new menu item to work? If so, then an update script is needed.
See #882186-6: Exception after upgrading to 2.1 from 2.0
And this is incompatible with the modified approach that I'm trying to implement. :(
i.e. why chain the request all the way from the menu item right down to the field level? It should be handled higher up in the chain.
So I'm going to skip this for the meantime and will try to implement another solution with the page callback higher up. So don't apply any patches after #30 in #1365750: Generalize API and Integrate with core field types, or this functionality will break.
Finally, the variable is 'remove_markup_default_'. This is not protected by the modules namespace. You should always prefix these variables with the module name to prevent potential clashes with other modules.
i.e. 'diff_remove_markup_default_'
Comment #104
Alan D. CreditAttribution: Alan D. commentedThe 3.x branch has a ported version, it would be great to get feedback on this.
I have left the links for both forms in the table: "Standard" & "Marked down", which differs from the committed patch and the variable is "diff_default_state_node", which has no admin UI yet.
http://www.example.com/admin/config/content/diff/fields has a UI where you can select the markdown option per field type. drupal_html_to_text() works well on text fields, but adds additional whitespace to other fields, so this is configurable. Text defaults to "drupal_html_to_text", the other fields do not get markdown options by default.
Comment #105
mitchell CreditAttribution: mitchell commentedSince this is now committed, I'm marking this issue as fixed.
If there are any bugs or improvements to be made, please post them in follow-up issues.
Comment #106.0
(not verified) CreditAttribution: commentedAdding remaining tasks and motivation.