Rich text editor (WYSIWYG): Greater than sign > turns into >

wpanssi - August 2, 2008 - 07:05
Project:GeSHi Filter for syntax highlighting
Version:6.x-1.1
Component:Filtering
Category:support request
Priority:normal
Assigned:Unassigned
Status:needs review
Description

When using geshi filter the > -sign in my code turns into >
Is there a way to deal with this? I need my > -signs to stay as they are..

Tnks in advance!

#1

soxofaan - August 2, 2008 - 09:42

What do you mean with "I need my > -signs to stay as they are"?

you can enter ">" as plain text in the textfield,
but in in HTML it's illegal to just use ">" in the markup, that's why it's encoded (in the markup) as ">",
when you view it in you browser however, it will show up as ">" again.

So normally the ">" you enter as text should show up as ">" in your browser view.

are there other input filters in the input format you use?
Do you have by any chance the HTML filter enabled after the GeSHi filter for example?

#2

wpanssi - August 12, 2008 - 15:36

My browser shows the ">" as "&gt".
I do have TinyMCE installed. Could that cause the problem? Is it possible to uset both rich text editor and geshifilter when editing a node?

What do you mean HTML filter enabled after the Geshi? The input format of my node's body is geshi.

#3

soxofaan - August 12, 2008 - 17:15
Status:active» won't fix

I do have TinyMCE installed. Could that cause the problem? Is it possible to uset both rich text editor and geshifilter when editing a node?

TineMCE is probably the culprit.
Using a rich text editor in combination with a filter like GeShi filter is not something I would recommend.
It's too easy to create a conflict between the two, for example with characters that need proper encoding (like ">", "<", "&", ...).
When you enter ">" in a rich text editor, it usually is translated to "&gt;" client side, before any other server side filtering (like GeSHi filter).
When GeSHi filter gets "&gt;" as input, it can't know if it was intended as "&gt;" or as ">" because it can't know if the rich text editor was active or not. That's why GeSHi assumes nothing and just handles "&gt;" as "&gt;", with the result you're experiencing.
As far as I know, this sort of problems need to be solved in the rich text editors.
So, I'll mark this as "won't fix" for GeSHi filter.

What do you mean HTML filter enabled after the Geshi? The input format of my node's body is geshi.

In Drupal you have input formats and input filters.
An input filter does a type of text filtering, like GeSHi filtering, or transforming an URL to a clickable link.
An input format is a set or pipeline of input filters (one or more) that are executed after each other in a certain order.
Node bodies are filtered by input formats, so you can have several input filters active for filtering the text.
That's why I asked if there is a input filter like HTML filter active in your input format.

But anyway, your problem apparently has to do with rich text editing, as noted above

#4

wpanssi - August 15, 2008 - 21:54

Ok. I let you know if I figure out someway to deal with this.

#5

chinka - November 5, 2008 - 19:03
Version:5.x-2.8» 6.x-1.1

I have the same problem with the FCKEditor and I still don't have any solution.

Anyone can help us on this ?

#6

glorinand - January 13, 2009 - 16:01
Status:won't fix» needs work

Hi there. About every wysiwyg editor will convert <> to < and > that's the correct behaviour... However geshi itself probably uses check_plain() somewhere thus converting to html entities again... My quick solution was to decode entities and strip tags before sending the text to be highlighted... Anyway - the attached patch does it... I've tested it in FCKEditor and in TinyMCE.

Now, you can certainly use to fix your problem, but I am not quite convinced whether this would be a good thing if you do not use a wysiwyg. Anyway, let the maintainer decide... However - surely this issue needs to be fixed as many sites use wysiwyg editors and that is certainly not a reason for this module to not work for them...

P.S.: There's another thing that you have to do if using a wysiwyg though - configure it so that it does not remove linebreaks from source and it's generally better to disable source formatting as such... I have used such option in both FCKEditor and in TinyMCE, I guess there's something similar for other editors as well.

AttachmentSize
geshifilter.pages_.inc_.patch 476 bytes

#7

soxofaan - January 14, 2009 - 07:54
Title:Greater than sign > turns into &gt;» Rich text editor (WYSIWYG): Greater than sign > turns into &gt;

About every wysiwyg editor will convert < and > to &lt; and &gt; that's the correct behaviour

That depends on what the correct behavior should be.
If the rich text editor (RTE aka WYSIWYG editor) is expected to output valid (X)HTML (because it will be directly inserted in a webpage), it is indeed the correct behavior.
In the case that the RTE is just a part of a longer pipeline (e.g. there is also GeSHi filtering), getting valid (X)HTML from the RTE is not necessarily required, as long as the complete pipeline spits out valid (X)HTML.

The patch from #6 is not a solution because it will fail when there is no RTE active.

The only partial solution for this problem I can can come up with is an option in the GeSHi filter to decode HTML entities before GeSHi filtering.
If there is no RTE active, this option should be off.
If there is a RTE active, it should be on.
But if the RTE is optional (user can switch it on/off), there is a problem as the GeSHi filter can not know if it gets plain content or escaped content (see my comment in #3). GeSHi filter could guess, but that's like opening a can of worms.
And I bet there are also other use cases (with other filters instead of RTEs) where things can get messy.

Anyway, it's a difficult problem to solve in a clean way (also see http://acko.net/blog/safe-string-theory-for-the-web for example) and I think it will only be possible in a structural way when RTE-support in Drupal is a bit more standardized (Drupal 7 I hope).

#8

glorinand - January 14, 2009 - 18:04

Yes, you are quite right. This is certainly not the spot that needs to be patched. It would be much better to make the wysiwygs just output plain text without converting to html entities...

This would work nicely for filtered html input type, not for full html though as no conversion to html entities would then occur there... I can't see any way out here - as there is probably no way for geshi to work well for full html, as full html input type will (and should) always have the entities encoded, right? So if wysiwyg does not know about geshi specifically it just has to output entities...

Of course full html is not enabled for regular users but still... an admin would sometimes need to use it so I guess filters like geshi just should work there as well... We really need to think about this carefully. :-)

#9

yyudhistira - March 15, 2009 - 14:32
Status:needs work» needs review

Hi,

I added a new options in geshifilter settings so called "Enable fix for wysiwyg autoformatting." which is a checkbox and can be enabled/disabled globally or input format specific.

Basically what it does is if you tick this options in combination with input filter that support wysiwyg, it does a following :
- similar with check_plain function, but rather than using htmlspecialchars() with $double_encode= true, I use $double_encode= false which will prevent double encoding on already encoded characters by wysiwyg.
- will change <br /> to new line, because <br /> also spits out by wysiwyg.

Of course site maintainer must make sure that he/she tick this option with wysiwyg enabled for that input filter. In other words, don't use this option in global settings, but use it for input format specific settings. If he/she still insists to use it without wysiwyg, it will still be encoded anyway. This will just prevent double encoding by both wysiwyg and then again by geshifilter.

I hope this is useful.

AttachmentSize
geshifilter-290279.patch 2.91 KB

#10

glorinand - March 16, 2009 - 21:03

Very nice. :-)

#11

Grayside - April 10, 2009 - 20:31

This is a vast improvement. Now, for some review.

If I start with the WYSIWYG turned off, and enter something like
for( $i=0; $i<count($array); i++ );
it drops most of the code block after the less than operator. If I insert a space just before "count" it works.

The WYSIWYG also seems to compress all the linebreaks out of the text.

This discrepancy only appears in the WYSIWYG, not when I save the node:
Also, and again in the PHP highlighting mode, if I have
print "html>body><strong>Width</strong></";*
in my sourcecode, I get
print "html>body>Width in the WYSIWYG. Note that the strong tags are not being escaped
Back to source, I have
print &quot;html&gt;body&gt;<strong>Width</strong><!--";
Note the presence of the HTML comment at the end of the line.

* The strange html>body> text relates to some separate testing of HTML tag behavior that seems to be related to FCKeditor specifically.

#12

Grayside - April 27, 2009 - 15:53

I picked this up again, and somehow the bug with my for-loop seems to have vanished. The HTML artifacts remain.

#13

chinka - September 23, 2009 - 09:02

Thank you yyudhistira. I applied your patch and now it works perfectly well...

#14

fizk - October 7, 2009 - 10:57

The patch helps, but it still doesn't solve the problem completely.

Eg. Using NicEdit Editor + WYSIWYG API module, if I type in:

<php>function($asdf) { }</php>

the <php> tag will be printed out as &lt;php&gt;, and the trailing </php> will show up in the highlighted code section.

Please see the attached screenshot.

AttachmentSize
Screenshot.png 14.38 KB

#15

nicholasThompson - October 23, 2009 - 14:45

I got around this in FCKEditor by adding the following 2 lines to all/modules/fckeditor/fckeditor.config.js.

FCKConfig.ProtectedSource.Add( /<pre[\s\S]*?\/pre>/g );
FCKConfig.ProtectedSource.Add( /<code[\s\S]*?\/code>/g );

That basically makes all content between and invisible in the "HTML View". This protects it against FCKEditors nasty/useful habit of entity encoding everything.

 
 

Drupal is a registered trademark of Dries Buytaert.