A site user submitted this article (pretty much as is) to my site, using an input format which uses the core line break converter.
For some reason, I get completely empty html output from the node body. It's there in the database, but not rendered, at all. Now, there's various markup issues in there, but this shouldn't result in the entire text being stripped.
On HEAD I get this as ouput for a 640 line document:
<div class="content clear-block">
</div>
fun eh?
I'd found this intermittently with other articles, and for now have 'fixed' it by switching to the line break handling provided by bbcode.module which doesn't cause the same problem. So it's definitely the line-break converter that's causing this.
To reproduce, copy and paste the attached .txt file, submit a post with 'filtered html' and watch the blank output. Try it again without the line break converter and it should show up.
Although this is probably a fairly obscure bug, it's also pretty nasty when it shows up, and has taken me ages to track down (I've yet to compare what bbcode.module and core do differently, which'd probably help with this). So marking as critical on the assumption that others can reproduce with the same input.
Comment | File | Size | Author |
---|---|---|---|
#5 | large_text_filter.patch | 1.17 KB | redndahead |
#2 | numbers.txt | 32.55 KB | redndahead |
#1 | shortened_text.txt | 92.85 KB | redndahead |
cardan.txt | 367.74 KB | catch |
Comments
Comment #1
redndahead CreditAttribution: redndahead commentedAfter some testing I'm pretty sure it's a size issue. I've attached a text file that will work. But add one more character and it should not work. This text file is 95078 characters long and it dies at the 95079th character. Hopefully this helps.
red
Comment #2
redndahead CreditAttribution: redndahead commentedRetrying with just numbers in the file and it came out to 33331 was the limit 33332 did not work. attached is the file.
Comment #3
redndahead CreditAttribution: redndahead commentedAfter more testing it looks like the issue occurs at line 903, in version 6.0 of filter.module. It looks like this:
Where $chunk is the text. I searched to see if there was a size limit on preg_replace in php and there doesn't seem to be. So I'm guessing it's an issue with the regular expression. Unfortunately regular expressions cause me to have twitches. So my skills stop here. Anyone else can help?
Comment #4
redndahead CreditAttribution: redndahead commentedFYI I added ! delimeters and it fixed the issue. That said I have no clue what the ! does I just copied it from another preg_replace in that file. Here is the updated preg_replace.
Comment #5
redndahead CreditAttribution: redndahead commentedAttached patch. With above changes. Not quite sure it's what would be preferred. It doesn't add p tags around the text.
Comment #6
redndahead CreditAttribution: redndahead commentedComment #7
vladimir.dolgopolov CreditAttribution: vladimir.dolgopolov commented1. I wonder why #4 works. I've tested it and the regex (!/$pattern/s!) has no much sense.
2. Anyway I've wrote a script for test different types of regexp delimiters:
I've got 33333 bytes.
I think it's PCRE limitation: http://regexkit.sourceforge.net/Documentation/pcre/pcre.html#SEC3
3. However, here is the test for the issue. IMHO it's won't fix.
Comment #8
vladimir.dolgopolov CreditAttribution: vladimir.dolgopolov commentedWe should get more info about regexp limitations.
Bug #24460 preg_match crashing on specific pattern/string size
Comment #9
catchvladimir and redndahead thanks for taking a look at this. That it's an arbitrary PCRE bug makes sense, however since there's a possible fix (and I still didn't check out what the other filter was doing differently) I'm going to mark it back to needs review. I'll try to actually test the patch asap.
Comment #10
vladimir.dolgopolov CreditAttribution: vladimir.dolgopolov commentedThere is test script for recognition of preg_replace limits concerning the issue:
I've got this: Limit for pattern: '/\n?(.+?)(?:\n\s*\n|\z)/s' is '33333' bytes.
Fortunately, a 's' modifier (PCRE_DOTALL) has not used in Drupal core with few exceptions like this issue.
Comment #11
redndahead CreditAttribution: redndahead commentedSo do you feel that it's the modifier that's causing the issue? It's only used for that one "." can we replace the "." with ^\n (I read that the "." with /s is used as a way to express not a newline.)
Comment #12
vladimir.dolgopolov CreditAttribution: vladimir.dolgopolov commented@redndahead Yes, you are right. I have some investigation: \s modifier does not causing the issue.
I've written another script with the regexp's modifications.
Things to think about.
Comment #13
redndahead CreditAttribution: redndahead commentedI've more narrowed it down to the \z. In other words checking for a new line at the end. Using $ doesn't work either. I've asked some smart drupal people and they're not sure what to do. Maybe someone that really knows regex can figure this out.
Comment #14
dana_johnson CreditAttribution: dana_johnson commentedThank you for finding the solution in #4, I just discovered this problem today and the solution works for me.
Comment #15
redndahead CreditAttribution: redndahead commentedSorry for the long update. It was discovered that my patch above actually doesn't wrap any p tags on any paragraphs. This is bad. So this needs some work. I'm out of my league now. Anyone else want to pick this up? I have narrowed it even further to:
|\z
This part of the regex kills the text. Anyone else that can run with this?
Comment #16
catchMarking as duplicate, this is an old one apparently :(
http://drupal.org/node/133188
Comment #17
yngens CreditAttribution: yngens commentedsubscribe. now not sure what to do - is #4 only temp solution?
Comment #18
bambangfals CreditAttribution: bambangfals commentedI still have problem with my long post. Still doesnt show under title of posting... any other solutions???