Closed (won't fix)
Project:
Bot
Version:
7.x-1.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
11 Jan 2013 at 23:52 UTC
Updated:
28 Mar 2017 at 10:40 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
kid_icarus commentedhttp://www.youtube.com/watch?v=iapcKVn7DdY
Comment #2
morbus iffThis is a bit of a known issue, though never really publicized.
There was some attempt to fix a similar issue long ago on line 82+:
http://drupalcode.org/project/bot.git/blob/refs/heads/7.x-1.x:/bot.smart...
But it didn't catch all of them, of course. I don't currently have a fix in mind.
Comment #3
morbus iffFor fire, the error I'm seeing is:
PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x94\xA5' for column 'message' at row 1: INSERT INTO {bot_log} (type, timestamp, channel, nick, message) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array ( [:db_insert_placeholder_0] => 2 [:db_insert_placeholder_1] => 1357946360 [:db_insert_placeholder_2] => #drupal-colorado [:db_insert_placeholder_3] => BLadwin [:db_insert_placeholder_4] =>
Comment #4
morbus iffBased on http://stackoverflow.com/questions/11936950/inserting-utf-8-encoded-stri...,
it seems that the version of MySQL that the bot server is running is at fault.
Comment #5
dave reidConverting the tables and database that run the bot to utf8 charset and utf8_general_ci collation should fix it.
Comment #6
morbus iffDave: The tables and DB already are UTF8, as far as I can tell.
SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = "druplicon_drupal" AND table_name = "bot_log" AND column_name = "message";"andSHOW CREATE TABLE bot_logboth indicate utf8.Why don't you think it's the four-byte-character problem above?
Comment #7
kid_icarus commentedJust to test, on a vanilla drupal 7 install with bot module, I tested the bot_log module and simply threw in a panda face into the logged channel. My bot crashed.
Then I verified that the db and tables were using the utf8 charset and utf8_general_ci collation.
I then ran the following query to convert the db and tables to 4-byte utf8:
I then restarted the bot, inserted many panda faces, and the bot didn't crash.
I'm not sure what we can do to check if input is outside of the 3-byte utf8 range. The utf8mb4 character set is unfortunately only available to MySQL >= 5.5
Comment #8
kid_icarus commentedThis is an interesting article for context: http://mathiasbynens.be/notes/mysql-utf8mb4
Comment #9
morbus iffThis is almost beginning to sound like a default worthy of core.
Comment #10
morbus iffCovered in #1314214, it seems.
Comment #11
Drave Robber commentedWould brutally removing all astral plane characters -
$data->message = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '_', $data->message);- solve this?(replacing with underscore instead of removing in order not to end up with empty messages)
On a second thought, this probably should be configurable - "turn this on if you cannot make your db utf8mb4 but still don't want wannabe vandals to take down your bot".
Comment #12
Drave Robber commentedA patch to that effect attached.
Works for me™; whether the default should be ON or OFF is a potential matter of discussion.
Comment #13
sreynen commented#1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) looks like it will add a setting to change the collation from utf8 to utf8mb4, but we'll still have databases using utf8, so bot.module needs to deal with that. However, this doesn't seem like something that needs to be exposed to end users. Couldn't we just check the collation on the column and strip 4-byte characters automatically where we know they won't save?
Comment #14
Drave Robber commentedMakes sense.
I will need to read into PDO docs to figure out how to accomplish this.
Comment #15
kid_icarus commentedSo this seems to work, but it's limited to MySQL. I don't think all this is doable in PDO. Any thoughts?
Comment #16
kid_icarus commentedWhoops, one line was in the wrong place.
Comment #17
kid_icarus commentedThat regex wasn't working, but this new one is.
See http://stackoverflow.com/a/10568589, essentially we're replacing all "Private Use Area" unicode characters. The original regex ended up replacing just about every character with '_'.
Comment #18
bgm commentedI also ran into this problem while writing a small module that allows me to post a new node from IRC.
Solved it by adding a try/catch statement around my function, ex:
This bug affects bot_factoid, bot_karma and bot_tell as well.
Comment #19
Drave Robber commentedCross-linking for reference - Twitter also meets this issue: #1910376: SQL error when importing tweet with emoji
Comment #20
florisg commentedConfirmed #17 solves the issue.
php smartIRC does crash on any character above FFFD (in theory it could be FFFF but the last two are not defined.
So i replaced at line 107 of bot.smartirc.inc:
Tested and it works flawlesly
Great work kid_icarus for finding the solution
Cudo's to Morbus Iff
Comment #21
markhalliwellComment #22
markhalliwellReferencing https://drupal.org/node/2043439#comment-8213973:
We should probably rename this to
_bot_db_sanitizeor something and wrap all the sub-modules that process the raw text before sending to the DB.edit: The reason this is longer is because it includes most of the emojicon unicodes, not just a select few.
Comment #23
markhalliwellSeveral of the sub-modules need this fix before it's saved in the DB.
Posting on behalf of Jeremy.
Comment #24
pwolanin commentedJust ran into this as well - Emoji seem to cause Druplicon to leave the channel (crash)
Comment #25
cafuego commentedPatch #23 works for me! :-)
Comment #26
a.ross commentedErr, yeah. I'm really sorry for crashing you today, Druplicon! Please do accept my apologies.
Comment #27
a.ross commentedHmm, what if we add a
module_exists('transliteration')here, and use that if it exists?Comment #28
joelpittetHow do you get it to come back after it's killed? :( Sorry druplicon!
#mayo
Comment #29
cafuego commentedSomeone with access needs to restart it on whatever host it runs on, I guess. Webmasters queue?
Comment #30
ciss commentedShouldn't this be handled in bot_smartirc_wrapper::invoke()?
Comment #31
dave reidComment #32
dave reidBumping priority.
Comment #33
morbus iffSee also https://www.drupal.org/node/2488180
At this point, I'm not interested in stripping characters - I'd rather the underlying MySQL drivers and database support the characters.
Comment #34
heddn@Morbus Iff, any chance we can get #2488180: Support full UTF-8 (emojis, Asian symbols, mathematical symbols) on MySQL and other database drivers when they are configured to allow it applied to the bot infrastructure so we can get it RTBCed? Or #23 applied? It just went down (and restarted itself again).
Comment #35
rfayAnd of course now, since Drupal 7.50, the bot Drupal instance "just" needs to be updated to 7.50 and the instructions over in Multi-byte UTF-8 support in Drupal 7 have to be completed. But YAY that the underlying issue has finally been solved.
Comment #36
ricardoamaro commentednote: to review
Comment #37
morbus iffClosing, as it's been fixed upstream in Drupal core.