Ok this is kind of strange. One of my users decided to drag and drop a bunch of photos from iphoto on her mac (using firefox) into a newsletter (simplenews) post, rather than using image upload button (WYSIWYG image upload - Inline images for your WYSIWYG 6.x-2.9 -- which works great BTW) We have the latest ckeditor installed via Wysiwyg 6.x-2.3 that she uses

somehow the editor decided to encode using base64:
<img alt="" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/4gxYSUNDX1BST0ZJTEUAAQEAAAxITGlubwIQ .. etc

which I guess is OK but with 4-5 photos of reasonable size viewing the post caused the server loads to jump to insane levels.. http taking 100% etc .. it shut us down if more than one or two folk were viewing the post. The funny thing is that they show up fine in the editor -- she just thought she was making a nifty post doing that.. it's when you save and publish when the pain hits. Not really sure why this hit apache so hard displaying the post (when in the editor it looked fine)

So I sysadmin a host for about 10 or so drupal sites that my friends use and this one post basically shut us all down .. and it took me a while to figure out what the hell was going on.

Maybe not a bug.. just a FYI. You can destroy a good linux server just by doing this on one post .. i was amazed .. loads up to 20-30 on the whole system just viewing this post w/images over about 20 mins (I removed images and load was back to .6 to 1 max)

Drupal core 6.20
latest ckeditor version
system info: dual core : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping : 2
cpu MHz : 3400.251
cache size : 2048 KB
2 gigs memory

CentOS release 5.6 (Final)
PHP 5.2.10

I'm not really sure if this is the place to post this .. seems related to ckeditor but it might be something else. All I know is dragging and dropping from iphoto completely SCREWED my server.. so watch out .. once I cleaned up the post and told the user how to use the wysiwyg uploader everything was fine again

WHEW!
-Zachary (the issue was on http://www.valleyflorafarm.com)

P.S. the post is saved if you need or want any more info.. i cloned the original and had her upload the pics via wysiwyg image uploader and everything is back to normal loads. Just a strange thing I thought you all should know about

thanks again

-Zachary

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dczepierga’s picture

Project: CKEditor 4 - WYSIWYG HTML editor » Wysiwyg
Version: 6.x-1.4 » 6.x-2.x-dev

U use WYSIWYG module with CKEditor library so I move this issue to correct bugtracker.

Greetings

sun’s picture

Title: CPU killer - FYI -- drag and drop images from iphoto into editor will kill your site » CKEditor on FF/Mac: Drag & drop of images into editor inserts them as base64-encoded data src attribute value
Component: Code » Editor - CKEditor
NITEMAN’s picture

We are having a similar issue with TinyMCE FF >4.x

Is there any way for WYSIWYG module to prevent this kind of insertion on DB? (things start to complicate when values get inserted: http://drupal.org/node/1271154 )

Thanks in advance!

Best regards

killua99’s picture

sub

TwoD’s picture

Category: bug » support

I'm not surprised this would cause serious performance problems on both the server and client.
After all, the entire image file's contents are being inserted as a binary blob in the HTML markup!
After that it won't be possible to download that node without having those blobs (which I guess are several megabyts large) fetched from the database, passed through all active content filters, downloaded to your browser and finally displayed. On every page load.

There's no bug in Wysiwyg or the editor here, but possibly a usability issue in the behaviour of dragging and dropping from iPhoto.
What's happening is that iPhoto passes the whole image file as a binary blob to the browser on the DnD event, which reformats it slightly to be acceptable HTML and inserts it all into the WYSIWYG editing area. The browser has no idea how Drupal or any other website works so it can't upload the file as an actual file the way the server would accept it. But it does know binary data is accepted in src attributes if base64 encoded. It performs well while editing because it already has access to the binary data from memory, but once it has to fetch it from the server everything falls in place and you've got yourself a server killer. ;)

The only thing I can think of that would prevent [most] of these problems is a validation callback on every textarea that would scrub the contents of src attributes that don't have a proper URI. The binary blobs would still get uploaded, with all the client/server stress that comes with that, but it would hopefully never reach the database.
You could also use a JavaScript to try to identify these src attributes to prevent accidental "uploads", but malicious users could simply turn it off.

NITEMAN’s picture

Title: CKEditor on FF/Mac: Drag & drop of images into editor inserts them as base64-encoded data src attribute value » CKEditor&TinyMCE on FF: Drag & drop of images into editor inserts them as base64-encoded data src attribute value
Component: Editor - CKEditor » Code

Well, you're right... this is a performance nightmare.

Facts: I've been able to reproduce the issue with TinyMCE on FF 5 drag'n'dropping from Windows Explorer.

Trying to identify these strings server side via preg_replace causes php to die silently (and the associated Apache thread with segfault). I'm using this regex: '/\"(data\:)?image\/(png|gif|jpg)\;base64\,(\d|\s|[a-z]|[A-Z]|\+|=|\/)+\"/' and replacing with '/imgpug/nobase64.png' (an image saying no drag'n'drop allowed)... but only works with very small images.

Having in mind that a malicious user can pass such a string (or even worse, a similar amount of random characters)... I think wysiwyg module may help dealing with this kind on trouble replacing data via JavaScript in the editor detach phase.

Could anybody please point me where to write the JS replace code? (I guess it will fit in editors/js/tinymce-3.js for my case, but a general fix will certainly be better).

Thanks in advance!

PS: Changed tittle and component to better describe the issue

TwoD’s picture

Title: CKEditor&TinyMCE on FF: Drag & drop of images into editor inserts them as base64-encoded data src attribute value » Images inserted as binary blobs on Drag & Drop.
Version: 6.x-2.x-dev » 7.x-2.x-dev

I put together a patch doing output filtering for #550428: Tags like "&nbsp;" or "<p>&nbsp;</p>" or "<br />" added to empty textareas. which could be made more abstract to run a bunch of more filters on content. There are only implementations for a few editors in that patch, but it should be very easy to implement for all of them.
That still does not cover the case of someone entering a huge blob manually without an editor.

Just in case this isn't related to just FF and/or some editors, I'll simplify the title a bit. Will run some tests when I get home.

NITEMAN’s picture

Right now we are dealing with this kind of content with an input filter which only prevents such content to be rendered or passed to other filters.

BTW my regex was wrong, correct one is '~"(?:data:)?image/(?:png|gif|jpg|jpeg);base64,[+=\w\s\d\/]*"~imu'

I'm sure a patch allowing custom replaces of strings (some kind of unified postProccessing on client side) will be a nice addition!

I'm sorry I can't figure where to put the replace call, maybe in wysiwyg/editors/js/tinymce-3.js changing this line data.content = Drupal.wysiwyg.plugins[plugin].detach(data.content, pluginSettings, ed.id); ?????

Thank you for your support!

TwoD’s picture

Check the patch in the issue I linked to above. I didn't put the replace directly in the editor implementation so the exact same filter code can be reused by mu,tiple editors. It would also be possible to let a cross-editor plugin (currently works with TinyMCE, CKEditor and FCKeditor) do a replace call in their detach() method without the need to patch Wysiwyg.

NITEMAN’s picture

The patch in that issue (http://drupal.org/node/550428#comment-3677828?) only applies to wysiwyg.js, ckeditor-3.0.js, fckeditor-2.6.js... noTinyMCE

So, again:

I'm sorry I can't figure where to put the replace call, maybe in wysiwyg/editors/js/tinymce-3.js changing this line
data.content = Drupal.wysiwyg.plugins[plugin].detach(data.content, pluginSettings, ed.id);
?????

Thank you again for your support.

Best regards

das-peter’s picture

If we can identify dragged-in images (yes we can!), we can handle them too.
Is there any reason why not to handle them as regular file uploads?
If not, I'd be happy to write a patch for this :)

Plan:

  1. Identify dragged-in files
  2. Parse data data:image/jpeg;base64,/9j/4AAQSkZJR
  3. Create file out of parsed data and handle it like a regular file upload/creation
  4. Replace the data in the editor by the url of the created file
TwoD’s picture

@NITEMAN, My point was to do it the same way for TinyMCE as I did for the other editors in that patch.
You were on the right track in #8, just after that line we could add something like:

data.content = Drupal.wysiwyg.removeBlankTags(data.content);

to get the same "blank tags" cleanups done for TinyMCE. In this case we should of course rename that function to "cleanupContent" or similar, and add the actual code to remove the binary blobs in there (hoping the browser won't choke if there are many blobs).

@das-peter, I don't have much experience with DnD detection on the client, but maybe you meant to do it on the server as some pre-processing step? If DnD could be relyably detected, and the data submitted properly, that would make a great [cross-]editor plugin. Dealing with file uploads isn't currently in Wysiwyg module's scope - we've got enough to worry about without that :( - so I hope that doesn't discourage you from trying.

das-peter’s picture

Status: Active » Needs review
FileSize
4.01 KB

TwoD thanks for your feedback - I take that as "Yes, would be a nice to have" :)
And here we go.

The attached patch registers an #element_validation callback to do this stuff:

  1. Check input for embedded files
  2. Check if the embedded file is an allowed type
  3. Store the file by using file_save_data()
  4. Ensure a second time the defined mime type matches to the one determined by file_get_mimetype()
  5. Replace the embedded file by the now available proper file url

One disadvantage is that the file isn't yet registered in the file_usage table. I'm not sure how this could be solved in a usable way. My only idea is to see how the Media Module has solved this problem. (As far as I know the use a filter)

Liam Morland’s picture

You can also prevent images with data URLs in CKEditor be removing them on paste or drag with a CKEditor plugin like this:

CKEDITOR.on('instanceReady', function (ev) {
	ev.editor.on('paste', function (ev) {
		ev.data.html = ev.data.html.replace(/<img( [^>]*)?>/gi, '');
	});

	ev.editor.document.on('DOMNodeInserted', function (ev) {
		var target = ev.data.getTarget();
		if (target.is && target.is('img') && target.getAttribute('src') && target.getAttribute('src').substr(0, 5) === 'data:') {
			target.remove();
		}
	});
});
ascii122’s picture

thanks for checking up on this. I just told my users DON'T DRAG AND DROP IMAGES .. so that's ok since I know all my folk with wysiwyg privs ..

It really can fubar your system for sure -- be kind of scary if you allowed guests or unknown users to have those kinds of posting privs .. you could drop a site in a heart beat :(

cheers

-z

Taxoman’s picture

Should this be a feature request instead of a support request?

TwoD’s picture

Title: Images inserted as binary blobs on Drag & Drop. » Handle images inserted as binary blobs on Drag & Drop.
Category: support » feature
Status: Needs review » Needs work

Yes, let's make this a feature request.

+++ b/wysiwyg.moduleundefined
@@ -272,6 +278,76 @@ function wysiwyg_pre_render_text_format($element) {
+        $data_start = preg_quote(substr($matches[1][$index], 0, 100), '/');
+        $data_end = preg_quote(substr($matches[1][$index], -100), '/');
+        $replacement_regexp = '/' . $data_start . '[+=\\w\\s\\d\\/]+' . $data_end . '/';

I wonder if not converting the image data to a regexp and just using preg_replace('image data string', $file_create_url(...)) would avoid the "regular expression is too large" error.

Anyway, unless we can solve the issue of files not being registered (and deleted when no longer referenced), I think we might have to rely on other modules to do this. I mean, it's not easy for Wysiwyg itself to know how to manage these uploaded images.
As far as I can tell, there's not really anything in the patch which needs Wysiwyg to function, so it could be contained in its own [sub-]module. Drag & Drop images do become a problem because of Wysiwyg - or maybe the editors as they don't prevent it - so I can't really decide what to do here...

das-peter’s picture

I thought about moving the handling as defined in #13 into the media or the media browser plus module. The only necessary change I see, would be to use the already available wysiwyg plugin of media to reference the uploaded images.

squarecandy’s picture

At the heart of it, this is a firefox issue.
I've started a bugzilla ticket to try and get FF to disable this "feature" by default.

https://bugzilla.mozilla.org/show_bug.cgi?id=729587

Please go and leave a comment there if you would like to see this resolved at the browser level.

Viybel’s picture

I left a comment in the bugzilla.mozilla thread:

This should not be the default behaviour. It creates gigantic base64 strings inside content, which I've seen crash a pretty beefy LAMP server (probably due to the CMS trying to parse it at some point).

The Drupal community is trying to handle this problem with JS or PHP based regex replace, but these are CPU-intensive too and it doesn't seem fair to me that web developers and sysadmins all over the world should have to deal with an arbitrary and unjustified change of behaviour from just one web browser.

IMHO, we should try and convince Mozilla devs to revert to the standard behavior. Does anyone have an idea how we could reach them besides commenting to their (pretty crowded) Bugzilla?

I don't now if there's an RFC mentioning this but I guess there should be, if it affects every system in the world using a browser-based wysiwyg editor.

Vianney

squarecandy’s picture

Title: Handle images inserted as binary blobs on Drag & Drop. » Firefox handling of Drag & Drop images: inserted as binary blobs in WYSIWYG editors.
Assigned: Unassigned » squarecandy

Yes anyone with any ideas of how to get this visible as an issue with Mozilla, please help!

Dane Powell’s picture

I don't think this is Mozilla's fault in any way- their change just exposed an existing flaw in Drupal and other services. If simply putting tons of data (not necessarily an image even - potentially just LOTS of text) can bring down a site like this, I'd say that's a problem with the site, not the browser!

In fact, I see this as a huge improvement in usability for the end-user. I'd much prefer to drag-and-drop images instead of having to go through all of the menus and clicking to manually upload them and then link to them in the text area.

At any rate- the patch in #13 is awesome, thanks das-peter! Is there any chance this could be turned into a generic input filter, so it can be used with other editors (and also ported to D6)?

GBurg’s picture

Want to slightly alter your code, as it makes pasting non html not possible anymore:

CKEDITOR.on('instanceReady', function (ev) {
  ev.editor.on('paste', function (ev) {
    if (!(typeof ev.data.html === "undefined")) {
      ev.data.html = ev.data.html.replace(/<img( [^>]*)?>/gi, '');
    }
  });     

  ev.editor.document.on('DOMNodeInserted', function (ev) {
    var target = ev.data.getTarget();
    if (target.is && target.is('img') && target.getAttribute('src') && target.getAttribute('src').substr(0, 5) === 'data:') {
      target.remove();
    }
  });
});
prinds’s picture

I have created a small sandbox project from the code in #23..

http://drupal.org/sandbox/prinds/1909644

browse code here : http://drupalcode.org/sandbox/prinds/1909644.git/tree

Hope you'll find it useful..

TwoD’s picture

A critical problem with this patch is that it unconditionally allows file uploading, even from anonymous comments or other places where you'd normally have a WYSIWYG editor with very limited capabilities. There's no way to control which formats allow file uploading, where files go, how large they may be, etc. We already have several solutions for this implemented by other modules, but this sidesteps all of the security measures used by them. We also trust that the MIME-type before the encoded data is accurate. I'm not sure what happens if you inline an executable file instead, change the MIME string to say "image/png", have Drupal store it, then fetch it back again (or worse, have something else try to run it directly on the server).

The server also doesn't know when a file is no longer needed. We'd have to analyze the contents before and after saving anything to the database to know if any embedded files were removed. Which also means we need the files to be reference tracked in the file usage table.

@Dane Powell, I can't see how this inline-upload feature has any real world use cases as a default behavior.
If the user adds enough images to any content, or a single large one, it's very easy for them to essentially make a DOS attack on the server. If bandwidth isn't an issue, a large enough amount of data will hit PHP execution timeouts, database query timeouts or size limits, HTTP timeouts, etc. There's nothing the CMS itself can do about that. Even what this patch could accomplish will only be successful if we don't hit any of those hard server limitations before/during the content pre-processing. If any of those limits are hit, the user will either see a WSOD or a 500 Internal Server error. If there were multiple files in the content, some may have been saved to disk already, making cleanup a pain, not to mention the text parts contents won't have been saved yet.

Submitting huge amounts of data to the server could of course be accomplished with plain text as well, but it's very uncommon for users to unknowingly type several megabytes worth of content into a single node or comment and push it up to the server.

Erik Seifert’s picture

osopolar’s picture

For TinyMCE the current default behavior seams to be ignoring the drop event (http://www.tinymce.com/wiki.php/Configuration:paste_data_images).

On server-side could be a validate function to prevent the form from being saved if there are inline-images or as a workaround the length of the textarea could be limited to prevent inserting large inline images.

TwoD’s picture

The problem with serverside limits is that by then the (complete) content has already been pushed across the wire, unless you limit the allowed size of the actual HTTP request, so DOS attacks are still possible this way.

osopolar’s picture

IMHO it's no so much about preventing DOS attacks, but prevent that users accidentally (or intentionally) fill the database with large images/data resulting in other problems like WSOD due to problems of some input filters with large data like spamspan: #1243042: Spamspan incompatible with big base64 inline images.

To prevent DOS attacks it should be fine to limit max post size. But there should be something to prevent users to drag and drop images into Textarea. This is fixed in TinyMCE 4.x but there is still no out of the box update (see #1968318: Support for TinyMCE 4.x).

kevinnivek’s picture

I realize this is an old issue, but database slowness as a result of too many dragged-and-dropped images into the WYSIWYG editor is still a potential problem for busy sites (especially with lots of end users).

I wrote a drush command to clean up your node content fields and convert the base64 encoded data to files , thereby offloading the burden from mysql to serve images. Take a look here for a more technical explanation : https://www.shift8web.ca/blog/2016/07/speed-drupal-clean-base64-encoded-...

Or you can get the module straight from Github : https://github.com/stardothosting/Drupal-Star-Image-Clean

Thanks!

steinmb’s picture

Assigned: squarecandy » Unassigned

I do not think @squarecandy still work on this