Bad Words filter garbling text on site - Character set connection?

Irene Kraus - September 13, 2008 - 22:51
Project:abuse
Version:5.x-1.0-beta
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:postponed (maintainer needs more info)
Description

Bear with me as I try to explain my research on this! Wondering also if this may relate to another bug report made here:
http://drupal.org/node/153240

In any case, the issue. On several occasions during the past few months, I have noticed that content stored on my site took on a 'garbled' appearance, as text was mixed in with strings of '****' characters. Found I could correct this for all of my stored content on my site (Drupal core content types, Image, and FAQ) so long as I stuck with the FULL HTML input type. However, the text entered for category descriptions (Image Galleries, FAQ categories) was not corrected.

I filed, in fact, a bug report concerning the core Input Filters for Drupal back in August on this, as it was quite clearly that substituting that string of '****' for something:
http://drupal.org/node/300845

Did a complete rebuild of my site after filing those reports, as no one suggested something to me that could remedy my problem. Came across the 'why' of it today by accident as I'm setting up another site. It is the 'Bad Words' filter included as part of the Abuse and Watchlist modules. So long as I do NOT enable that for the default Filtered HTML input type, there are no problems. But, as soon as that gets enabled, everything tends to go bad. Why? I believe it has something to do with the fact that I am using the UTF-8 character set, and all 'collation' settings to: utf8_general_ci (This being the setting created by default when Drupal was installed on my hosting companies web server, and to which I've stuck too.)

The books I have on installing and managing Drupal, at least in so far as I've been able to sort out, haven't really told me which setting I should have for those 'collation' portions. Some specific tables, in fact, are set to use the 'latin1_swedish_ci' one for some strange reason. In any case, not knowing what I should change there, I've removed the ABUSE module from my live site as I most certainly do NOT want garbled up text showing anywhere!

#1

BTMash - October 15, 2008 - 19:22
Status:active» postponed (maintainer needs more info)

I've not updated the 5.x version of the abuse module in quite a while - however, realizing in its infancy that I had not placed a set collation type to utf8 for the created abuse tables, I later found out it was doing that.

As for the reason you were getting the **** appearing for all the characters...by turning on the watchlist filter, you allow it to replace words that match a certain regular expression to be changed to something 'safe' on the site (so if someone wrote a bad word on your site, it could get filtered out). And these are generally saved in your filters cache. So you needed to add which bad words should get filtered (or if you did not want anything filtered, you do not enable the watchlist filter...or module) and that should have solved it.

 
 

Drupal is a registered trademark of Dries Buytaert.